SlideShare a Scribd company logo
1 of 37
Download to read offline
StudyBlue




Databases at Scale:
A MongoDB Case Study


August 23, 2012




StudyBlue, Inc.
Overview


  •      About Me

  •      About StudyBlue

  •      Why MongoDB?

  •      Leveraging MongoDB

  •      Key Issues

  •      Q&A




StudyBlue, Inc.
Who am I?


  •      Sean Laurent

  •      sean@studyblue.com

  •      Head of Operations at StudyBlue, Inc.




StudyBlue, Inc.
studyblue.com



StudyBlue, Inc.
About StudyBlue

  •     Online service for storing, studying, sharing
        and ultimately mastering course material


  •     Digital backpack for students




StudyBlue, Inc.
StudyBlue Usage

  •     Many simultaneous users


  •     Rapid growth


  •     Cyclical usage




StudyBlue, Inc.
Initial Use Case



StudyBlue, Inc.
Flashcard Scoring


  •      Track flashcard scoring over time

       •      Every single card

       •      Every single user

       •      Forever


  •      Provide aggregate statistics

       •      Flashcard deck

       •      Folder

       •      Overall


  •      Focus on content mastery



StudyBlue, Inc.
Scoring Results
StudyBlue, Inc.
The Problem


  •      Reasonably large number of cards

  •      Large number of users

  •      Users base increasing rapidly

  •      Shift in usage - increasing faster than users

       •      Time on site

       •      Decks per user

       •      Average deck size

       •      Study sessions per user




StudyBlue, Inc.
StudyBlue Database Problems

  •     Amazon EC2


  •     Large number of simultaneous users


  •     High write volume


  •     Single PostgreSQL database


  •     Large tables




StudyBlue, Inc.
Why Mongo?



StudyBlue, Inc.
Alternatives


  •      Amazon Simple DB

       •      Far too simple


  •      Cassandra

       •      Difficult to add nodes and rebalance

       •      Column families cannot be modified w/out restart


  •      CouchDB

       •      Difficult to add nodes and rebalance


  •      Redis

       •      No native support for sharding/partitioning

       •      Master/slave only - no automatic failover

StudyBlue, Inc.
MongoDB for the Win


  •      Highly available

       •      Replica sets

       •      Automatic failover


  •     Horizontal scaling across shards

       •     Improved write performance


       •     Improved availability during failures


       •      Easy to add additional shards


  •     Easier maintenance


StudyBlue, Inc.
Implementation:
Phase 1


StudyBlue, Inc.
Development

  •     100% Java


  •     Existing PostgreSQL
        database

       •     System of record


       •     Synchronization issues




StudyBlue, Inc.
SQL Integration & Synchronization


  •      PostgreSQL considered system of record

  •      Asynchronous event driven

  •      Web servers queue change events

  •      Scoring servers process events

       •      Query PostgreSQL

       •      Update MongoDB




StudyBlue, Inc.
Architecture v1
StudyBlue, Inc.
MongoDB Schema


  •      Many shallow collections vs monolithic deep collection

  •      Leverage existing SQL knowledge

  •      Simplify SQL integration




StudyBlue, Inc.
Implementation:
Phase 2


StudyBlue, Inc.
DevOps


  •      Amazon EC2

       •      Separate dev, test and production environments


  •      Scripting & automation

       •      Creation

       •      Cloning

       •      Configuration management with Chef




StudyBlue, Inc.
Even More Data


  •     Moved existing tables from PostgreSQL to MongoDB

       •     Four PostgreSQL tables with millions of rows combined into single collection


  •     New development uses MongoDB:

       •     Analytics data with 300+ million documents




StudyBlue, Inc.
SQL Integration Part 2


  •      MongoDB considered system of record

  •      Web servers interact with MongoDB directly

  •      More complex structures, fewer shallow collections




StudyBlue, Inc.
Key Issues



StudyBlue, Inc.
Summary

  •     NoSQL vs SQL


  •     Design challenges


  •     Amazon EC2/EBS


  •     Partitioning & sharding


  •     Replication Lag




StudyBlue, Inc.
NoSQL vs SQL

  •     NoSQL != SQL


  •     Document database != RDBMS


  •     No joins


  •     Requires new mindset


  •     Store related data together


  •     Duplicate data as necessary




StudyBlue, Inc.
Design Challenges

  •     Multiple tables to single collections with complex objects


  •     Avoid growing objects

       •     Padding


       •     In-place update vs move


  •     Challenges with array elements




StudyBlue, Inc.
Amazon EC2 & EBS

  •     Plan for failure

       •     “When” not “if”


  •     EBS performance

       •     Inconsistent


       •     Limited by bandwidth


       •     100 IOPS / volume


       •     RAID-0




StudyBlue, Inc.
Instance Sizing

  •     Memory is king


  •     Keep working set in RAM

       •     Indexes


       •     Working data


  •     Spread horizontally instead of vertically

       •     Increased write performance




StudyBlue, Inc.
Data Routing with Shards




StudyBlue, Inc.
Partitioning in the Cloud


  •      Operations perspective

       •      Dynamic changes in machines

            •     Config servers track machines

            •     Each node in replica set knows other nodes

            •     Avoids restarting applications when Mongo servers change

       •      Easy scaling

            •     Local shard servers

            •     Config servers store redundant copies

                  •   Two-phase commit




StudyBlue, Inc.
Picking a shard key

  •     Shard key selection critical for proper distribution

       •     Spread writes across cluster


  •     Depends on usage

       •     Single document vs aggregation


  •     Examples all time-series data


  •     Cannot be changed




StudyBlue, Inc.
Sharding - Gritty Details

  •     Chunks

       •     64 MB blocks of data


  •     Splits

       •     1 chunk turns into 2 chunks


  •     Rebalance

       •     Move chunks to different nodes


       •     Maintain even distribution of chunks




StudyBlue, Inc.
Rebalancing Challenges

  •     Splits have to find mid point of chunk


  •     Very I/O expensive for collections with small documents

       •     Decreased chunk size


       •     Made documents larger & more complex


  •     Can be a drain on system


  •     Needs to run frequently




StudyBlue, Inc.
Replication Lag

  •     Eventual consistency


  •     No guarantees about lag


  •     Replica safe writes

       •     Data committed to at least 2 nodes


       •     Can cause problems with high replication lag


       •     Security vs time




StudyBlue, Inc.
Q&A



StudyBlue, Inc.
Contact us
Web: http://www.studyblue.com
Twitter: @StudyBlue
Email: sean@studyblue.com




   StudyBlue, Inc.

More Related Content

What's hot

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectureshypertable
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDBStone Gao
 
Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Sargun Dhillon
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsSteven Francia
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloudJustin Swanhart
 
Study Notes: Facebook Haystack
Study Notes: Facebook HaystackStudy Notes: Facebook Haystack
Study Notes: Facebook HaystackGao Yunzhong
 
Postgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterPostgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterEDB
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBJustin Smestad
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionTamir Dresher
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudKellyn Pot'Vin-Gorman
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB DeploymentMongoDB
 

What's hot (20)

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Dissecting Scalable Database Architectures
Dissecting Scalable Database ArchitecturesDissecting Scalable Database Architectures
Dissecting Scalable Database Architectures
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Challenges with MongoDB
Challenges with MongoDBChallenges with MongoDB
Challenges with MongoDB
 
Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloud
 
Study Notes: Facebook Haystack
Study Notes: Facebook HaystackStudy Notes: Facebook Haystack
Study Notes: Facebook Haystack
 
Postgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterPostgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps Faster
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Where Is My Data - ILTAM Session
Where Is My Data - ILTAM SessionWhere Is My Data - ILTAM Session
Where Is My Data - ILTAM Session
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Power BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle CloudPower BI with Essbase in the Oracle Cloud
Power BI with Essbase in the Oracle Cloud
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
 

Viewers also liked

MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To RelaxCloudant
 
Mongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMurat Çakal
 
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...DataStax
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesSudarshan Dhondaley
 
Storage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 NotesStorage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 NotesSudarshan Dhondaley
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services SectorNorberto Leite
 
Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesSudarshan Dhondaley
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Alexandre Morgaut
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDBJohn Wood
 

Viewers also liked (10)

MongoDB Case Study in Healthcare
MongoDB Case Study in HealthcareMongoDB Case Study in Healthcare
MongoDB Case Study in Healthcare
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To Relax
 
Mongodb open source_high_performance_database
Mongodb open source_high_performance_databaseMongodb open source_high_performance_database
Mongodb open source_high_performance_database
 
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 Notes
 
Storage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 NotesStorage Area Networks Unit 4 Notes
Storage Area Networks Unit 4 Notes
 
MongoDB on Financial Services Sector
MongoDB on Financial Services SectorMongoDB on Financial Services Sector
MongoDB on Financial Services Sector
 
Storage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 NotesStorage Area Networks Unit 3 Notes
Storage Area Networks Unit 3 Notes
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
 
Real World CouchDB
Real World CouchDBReal World CouchDB
Real World CouchDB
 

Similar to MongoDB Case Study at NoSQL Now 2012

Store
StoreStore
StoreESUG
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsJosh Carlisle
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Marco Tusa
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 DistilledGrig Gheorghiu
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
SpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB AdministrationSpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB AdministrationSpringPeople
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in CloudHoward Marks
 
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Amazon Web Services
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2ScribbleLive
 

Similar to MongoDB Case Study at NoSQL Now 2012 (20)

Store
StoreStore
Store
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
 
noSQL choices
noSQL choicesnoSQL choices
noSQL choices
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Drop acid
Drop acidDrop acid
Drop acid
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
SpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB AdministrationSpringPeople Introduction to MongoDB Administration
SpringPeople Introduction to MongoDB Administration
 
Managing storage on Prem and in Cloud
Managing storage on Prem and in CloudManaging storage on Prem and in Cloud
Managing storage on Prem and in Cloud
 
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 

Recently uploaded

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 

Recently uploaded (20)

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 

MongoDB Case Study at NoSQL Now 2012

  • 1. StudyBlue Databases at Scale: A MongoDB Case Study August 23, 2012 StudyBlue, Inc.
  • 2. Overview • About Me • About StudyBlue • Why MongoDB? • Leveraging MongoDB • Key Issues • Q&A StudyBlue, Inc.
  • 3. Who am I? • Sean Laurent • sean@studyblue.com • Head of Operations at StudyBlue, Inc. StudyBlue, Inc.
  • 5. About StudyBlue • Online service for storing, studying, sharing and ultimately mastering course material • Digital backpack for students StudyBlue, Inc.
  • 6. StudyBlue Usage • Many simultaneous users • Rapid growth • Cyclical usage StudyBlue, Inc.
  • 8. Flashcard Scoring • Track flashcard scoring over time • Every single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content mastery StudyBlue, Inc.
  • 10. The Problem • Reasonably large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per user StudyBlue, Inc.
  • 11. StudyBlue Database Problems • Amazon EC2 • Large number of simultaneous users • High write volume • Single PostgreSQL database • Large tables StudyBlue, Inc.
  • 13. Alternatives • Amazon Simple DB • Far too simple • Cassandra • Difficult to add nodes and rebalance • Column families cannot be modified w/out restart • CouchDB • Difficult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failover StudyBlue, Inc.
  • 14. MongoDB for the Win • Highly available • Replica sets • Automatic failover • Horizontal scaling across shards • Improved write performance • Improved availability during failures • Easy to add additional shards • Easier maintenance StudyBlue, Inc.
  • 16. Development • 100% Java • Existing PostgreSQL database • System of record • Synchronization issues StudyBlue, Inc.
  • 17. SQL Integration & Synchronization • PostgreSQL considered system of record • Asynchronous event driven • Web servers queue change events • Scoring servers process events • Query PostgreSQL • Update MongoDB StudyBlue, Inc.
  • 19. MongoDB Schema • Many shallow collections vs monolithic deep collection • Leverage existing SQL knowledge • Simplify SQL integration StudyBlue, Inc.
  • 21. DevOps • Amazon EC2 • Separate dev, test and production environments • Scripting & automation • Creation • Cloning • Configuration management with Chef StudyBlue, Inc.
  • 22. Even More Data • Moved existing tables from PostgreSQL to MongoDB • Four PostgreSQL tables with millions of rows combined into single collection • New development uses MongoDB: • Analytics data with 300+ million documents StudyBlue, Inc.
  • 23. SQL Integration Part 2 • MongoDB considered system of record • Web servers interact with MongoDB directly • More complex structures, fewer shallow collections StudyBlue, Inc.
  • 25. Summary • NoSQL vs SQL • Design challenges • Amazon EC2/EBS • Partitioning & sharding • Replication Lag StudyBlue, Inc.
  • 26. NoSQL vs SQL • NoSQL != SQL • Document database != RDBMS • No joins • Requires new mindset • Store related data together • Duplicate data as necessary StudyBlue, Inc.
  • 27. Design Challenges • Multiple tables to single collections with complex objects • Avoid growing objects • Padding • In-place update vs move • Challenges with array elements StudyBlue, Inc.
  • 28. Amazon EC2 & EBS • Plan for failure • “When” not “if” • EBS performance • Inconsistent • Limited by bandwidth • 100 IOPS / volume • RAID-0 StudyBlue, Inc.
  • 29. Instance Sizing • Memory is king • Keep working set in RAM • Indexes • Working data • Spread horizontally instead of vertically • Increased write performance StudyBlue, Inc.
  • 30. Data Routing with Shards StudyBlue, Inc.
  • 31. Partitioning in the Cloud • Operations perspective • Dynamic changes in machines • Config servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Config servers store redundant copies • Two-phase commit StudyBlue, Inc.
  • 32. Picking a shard key • Shard key selection critical for proper distribution • Spread writes across cluster • Depends on usage • Single document vs aggregation • Examples all time-series data • Cannot be changed StudyBlue, Inc.
  • 33. Sharding - Gritty Details • Chunks • 64 MB blocks of data • Splits • 1 chunk turns into 2 chunks • Rebalance • Move chunks to different nodes • Maintain even distribution of chunks StudyBlue, Inc.
  • 34. Rebalancing Challenges • Splits have to find mid point of chunk • Very I/O expensive for collections with small documents • Decreased chunk size • Made documents larger & more complex • Can be a drain on system • Needs to run frequently StudyBlue, Inc.
  • 35. Replication Lag • Eventual consistency • No guarantees about lag • Replica safe writes • Data committed to at least 2 nodes • Can cause problems with high replication lag • Security vs time StudyBlue, Inc.
  • 37. Contact us Web: http://www.studyblue.com Twitter: @StudyBlue Email: sean@studyblue.com StudyBlue, Inc.

Editor's Notes

  1. \n
  2. \n
  3. - Developer at heart\n- 15 years experience\n- Responsible for selecting Mongo\n\n
  4. \n
  5. - 15 person startup\n- Bottom-up attempt to improve student outcomes through disruptive change outside of the education system. \n- Allows students to create and store lecture notes and flashcards and access them online and via mobile apps (iOS and Android)\n
  6. - No public numbers (low millions)\n- 4000 simultaneous users (peak)\n- 120+ countries\n- Daily cycle slowly flattening\n
  7. \n
  8. \n
  9. \n
  10. - 20 million cards at the time\n- Over 60 million cards now\n- Expect 100 million cards in next 6 months\n
  11. - EC2 limits vertical scaling\n- Postgres tuning extremely beneficial\n- Tables > 70 million rows\n
  12. \n
  13. Cassandra & Redis have since improved \nAmazon Dynamo didn’t exist\n\n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. Launch replacement Mongo server in < 10 mins\nClone entire production Mongo cluster in < 60 mins\n
  22. - Not huge by BigData standards - Couple terabytes\n- Big by startup standards\n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. Provisioned IOPS\n
  29. - Working set is ~20% for SB, mostly recently created data\n
  30. \n
  31. \n
  32. http://www.snailinaturtleneck.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/\n
  33. \n
  34. Ran nightly - backlog causes really high load\n
  35. \n
  36. \n
  37. \n