SlideShare a Scribd company logo
1 of 39
StudyBlue




StudyBlue and MongoDB:
Implementation 101


October 18, 2011




StudyBlue, Inc.
Overview


  •      Who am I?

  •      Who is StudyBlue?

  •      Why MongoDB?

  •      How did we leverage MongoDB?

  •      What lessons did we learn?

  •      Q&A




StudyBlue, Inc.
Who am I?


  •      Sean Laurent

  •      sean@studyblue.com

  •      Director of Operations at StudyBlue, Inc.




StudyBlue, Inc.
studyblue.com



StudyBlue, Inc.
About StudyBlue

  •     Bottom-up attempt to improve student
        outcomes


  •     Online service for storing, studying, sharing
        and ultimately mastering course material


  •     Digital backpack for students


  •     Freemium business model




StudyBlue, Inc.
StudyBlue Usage

  •     Many simultaneous users


  •     Rapid growth


  •     Cyclical usage




StudyBlue, Inc.
The Challenge



StudyBlue, Inc.
Flashcard Scoring


  •      Track flashcard scoring

       •      Every single card

       •      Every single user

       •      Forever


  •      Provide aggregate statistics

       •      Flashcard deck

       •      Folder

       •      Overall


  •      Focus on content mastery



StudyBlue, Inc.
Scoring Results
StudyBlue, Inc.
The Problem


  •      Existing PostgreSQL database

  •      Reasonably large number of cards

  •      Large number of users

  •      Users base increasing rapidly

  •      Shift in usage - increasing faster than users

       •      Time on site

       •      Decks per user

       •      Average deck size

       •      Study sessions per user



StudyBlue, Inc.
Additional Requirements


  •      Support sustained rapid growth

  •      Highly available

  •      Minimize maintenance costs

  •      Active community

  •      Done yesterday




StudyBlue, Inc.
Why Mongo?



StudyBlue, Inc.
Alternatives


  •      Amazon Simple DB

       •      Far too simple


  •      Cassandra

       •      Difficult to add nodes and rebalance

       •      Column families cannot be modified w/out restart


  •      CouchDB

       •      Difficult to add nodes and rebalance


  •      Redis

       •      No native support for sharding/partitioning

       •      Master/slave only - no automatic failover

StudyBlue, Inc.
MongoDB for the Win


  •      Highly available

       •      Replica sets

       •      Automatic failover


  •      Shards

       •      Works across replica sets

       •      Easy to add additional shards


  •      Node addition

       •      Read performance degradation when adding nodes

            •     “hidden” flag

       •      No down time

StudyBlue, Inc.
More winning


  •      Atomic insert & replace

  •      Read balancing across slaves

  •      BSON/JSON document model

  •      It just works. Seriously.




StudyBlue, Inc.
Implementation



StudyBlue, Inc.
DevOps


  •      Amazon EC2

       •      Separate dev, test and production environments


  •      Operations testing

       •      Replication

       •      Failover


  •      Scripting & automation

       •      Creation

       •      Cloning




StudyBlue, Inc.
Development

  •     100% Java


  •     Existing PostgreSQL
        database

       •     System of record


       •     Synchronization issues




StudyBlue, Inc.
SQL Integration & Synchronization


  •      PostgreSQL considered system of record

  •      Asynchronous event driven

  •      Web servers queue change events

  •      Scoring server processes events

       •      Query PostgreSQL

       •      Update MongoDB




StudyBlue, Inc.
Architecture
StudyBlue, Inc.
MongoDB Schema


  •      Many shallow collections vs monolithic deep collection

  •      Leverage existing SQL knowledge

  •      Simplify SQL integration




StudyBlue, Inc.
Schema Design


  •      Two collections used together to map relationships

       •      Folder containing Deck

       •      Decks in a Folder

       •      Decks containing a Card

       •      Cards in a Deck


  •      Folders arranged in tree structure,

       •      One row per folder that points to its parent.

       •      Multiple queries required to build tree


  •      Postgres primary keys are used instead of object ids



StudyBlue, Inc.
StudyBlue, Inc.
Document Scores Example
StudyBlue, Inc.
Slave Reads


  •      SlaveOk set to true for most data retrieval

  •      Scoring calculations use Primary to ensure correctness




StudyBlue, Inc.
Data migration

  •     One-time process


  •     Postgres to MongoDB


  •     Ruby scripts


  •     Separate server




StudyBlue, Inc.
Key Issues



StudyBlue, Inc.
Summary

  •     Amazon EC2/EBS


  •     Java API


  •     MapReduce


  •     Replication


  •     Partitioning / Shards


  •     Performance




StudyBlue, Inc.
Amazon EC2 & EBS

  •     Plan for failure

       •     “When” not “if”


  •     EBS performance

       •     Inconsistent


       •     Limited by bandwidth


       •     60GB minimum


       •     RAID-0




StudyBlue, Inc.
Java API

  •     Not perfect

       •     Verbose

       •     Type safety

  •     Failover requires retry

       •     Up to 1 minute delay

  •     Read-only requests

       •     “slaveOk” works

       •     Burden on developer




StudyBlue, Inc.
Map Reduce

  •     Perfect for aggregation


  •     Not used by StudyBlue

       •     Not needed (yet)


       •     Difficult with multiple collections


       •     Reduce limited to masters


       •     Keep scalability simple


  •     Under consideration



StudyBlue, Inc.
Replication

  •     Automated failover


  •     Read scaling


  •     Maintenance


  •     Easy setup & configuration


  •     “Seed” node(s) for clients




StudyBlue, Inc.
Partitioning in the Cloud


  •      Operations perspective

       •      Dynamic changes in machines

            •     Config servers track machines

            •     Each node in replica set knows other nodes

            •     Avoids restarting applications when Mongo servers change

       •      Easy scaling

            •     Local shard servers

            •     Config servers store redundant copies

                  •   Two-phase commit




StudyBlue, Inc.
Useful EC2 Instance Types

  •     Config servers                         •       Mongo replica nodes
       •     t1.micro or m1.small                 •     Depends on memory needs

                                                  •     m2.xlarge, m2.2xlarge, m2.4xlarge or
                                                        cc1.4xlarge




         Name                       Memory              CU                       I/O
        m2.xlarge                   17.1 GB    6.5 (2 cores x 3.25)            medium

       m2.2xlarge                   34.2 GB    13 (4 cores x 3.25)               high

       m2.4xlarge                   68.4 GB    26 (8 cores x 3.25)               high

       cc1.4xlarge                   23 GB    33.5 (2 x Xeon X5570)            very high


StudyBlue, Inc.
Performance Issues


  •      Missing indexes

       •      Performance terrible without indexes

       •      Index on the fly


  •      Store array sizes in collection

  •      OR vs IN

  •      Redundant updates

       •      Events not consolidated




StudyBlue, Inc.
Lessons Learned



StudyBlue, Inc.
Key Lessons


  •      Amazon great, but plan for failure

  •      Leverage test platforms

  •      Use replica sets & partitions early

  •      Indexes critical

  •      Use IN instead of OR

  •      Java API cumbersome, but solid

  •      Design schema carefully


StudyBlue, Inc.
Q&A



StudyBlue, Inc.
Contact us
Web: http://www.studyblue.com
Twitter: @StudyBlue
Email: sean@studyblue.com




   StudyBlue, Inc.

More Related Content

What's hot

Running Open Source Solutions on Windows Azure
Running Open Source Solutions on Windows AzureRunning Open Source Solutions on Windows Azure
Running Open Source Solutions on Windows AzureSimon Evans
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈Tim Y
 
Scaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL ServersScaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL Serversheraflux
 
Right-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual MachineRight-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual Machineheraflux
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014marvin herrera
 
How_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmHow_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmNigel Price
 
Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11Oracle BH
 
Presentation database on flash
Presentation   database on flashPresentation   database on flash
Presentation database on flashxKinAnx
 
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposhaPgDay.Seoul
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Serverhannonhill
 
WebObjects Optimization
WebObjects OptimizationWebObjects Optimization
WebObjects OptimizationWO Community
 
Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Sargun Dhillon
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB
 
Power Saturday 2019 B6 - SQL Server installation cookbook
Power Saturday 2019 B6 - SQL Server installation cookbookPower Saturday 2019 B6 - SQL Server installation cookbook
Power Saturday 2019 B6 - SQL Server installation cookbookPowerSaturdayParis
 
Windows Azure Blob Storage
Windows Azure Blob StorageWindows Azure Blob Storage
Windows Azure Blob Storageylew15
 
Study Notes: Facebook Haystack
Study Notes: Facebook HaystackStudy Notes: Facebook Haystack
Study Notes: Facebook HaystackGao Yunzhong
 

What's hot (20)

Running Open Source Solutions on Windows Azure
Running Open Source Solutions on Windows AzureRunning Open Source Solutions on Windows Azure
Running Open Source Solutions on Windows Azure
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈
 
Scaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL ServersScaling Up and Out your Virtualized SQL Servers
Scaling Up and Out your Virtualized SQL Servers
 
Right-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual MachineRight-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual Machine
 
Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014Colvin exadata mistakes_ioug_2014
Colvin exadata mistakes_ioug_2014
 
How_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmHow_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_Farm
 
Caching 101 - WordCamp OC
Caching 101 - WordCamp OCCaching 101 - WordCamp OC
Caching 101 - WordCamp OC
 
Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11Exadata 11-2-overview-v2 11
Exadata 11-2-overview-v2 11
 
Presentation database on flash
Presentation   database on flashPresentation   database on flash
Presentation database on flash
 
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha[Pgday.Seoul 2018]  PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
[Pgday.Seoul 2018] PostgreSQL 성능을 위해 개발된 라이브러리 OS 소개 apposha
 
Optimizing MySQL for Cascade Server
Optimizing MySQL for Cascade ServerOptimizing MySQL for Cascade Server
Optimizing MySQL for Cascade Server
 
WebObjects Optimization
WebObjects OptimizationWebObjects Optimization
WebObjects Optimization
 
Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack Papers We Love Too, June 2015: Haystack
Papers We Love Too, June 2015: Haystack
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
IaaS for DBAs in Azure
IaaS for DBAs in AzureIaaS for DBAs in Azure
IaaS for DBAs in Azure
 
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB DeploymentsMongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments
 
Power Saturday 2019 B6 - SQL Server installation cookbook
Power Saturday 2019 B6 - SQL Server installation cookbookPower Saturday 2019 B6 - SQL Server installation cookbook
Power Saturday 2019 B6 - SQL Server installation cookbook
 
Windows Azure Blob Storage
Windows Azure Blob StorageWindows Azure Blob Storage
Windows Azure Blob Storage
 
MySQL 5.7 what's new
MySQL 5.7 what's newMySQL 5.7 what's new
MySQL 5.7 what's new
 
Study Notes: Facebook Haystack
Study Notes: Facebook HaystackStudy Notes: Facebook Haystack
Study Notes: Facebook Haystack
 

Viewers also liked

CV Al Rawajfeh Sep2016_CV & LoP
CV Al Rawajfeh Sep2016_CV & LoPCV Al Rawajfeh Sep2016_CV & LoP
CV Al Rawajfeh Sep2016_CV & LoPaimanrawa
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseChris Clarke
 
Issd africa photo competition publish
Issd africa photo competition publishIssd africa photo competition publish
Issd africa photo competition publishJames Mulkerrins
 
Introduction of apache giraph project
Introduction of apache giraph projectIntroduction of apache giraph project
Introduction of apache giraph projectChun Cheng Lin
 
Viva las vegas
Viva las vegasViva las vegas
Viva las vegasSyaff Hk
 
HP Quick Test Professional
HP Quick Test ProfessionalHP Quick Test Professional
HP Quick Test ProfessionalVitaliy Ganzha
 
classroom learning community
classroom learning community classroom learning community
classroom learning community Mandy Ellis
 
Quality 101: Introduction to Continuous Improvement
Quality 101: Introduction to Continuous ImprovementQuality 101: Introduction to Continuous Improvement
Quality 101: Introduction to Continuous ImprovementMandy Ellis
 
Po report 4
Po report 4Po report 4
Po report 4Syaff Hk
 
Zat Adiktif dan Psikotropika
Zat Adiktif dan PsikotropikaZat Adiktif dan Psikotropika
Zat Adiktif dan PsikotropikaVERGITA HANDOKO
 
Analisis surah al hujurat 10
Analisis surah al hujurat 10Analisis surah al hujurat 10
Analisis surah al hujurat 10VERGITA HANDOKO
 
Eksposisi perbandingan dan pertentangan
Eksposisi perbandingan dan pertentanganEksposisi perbandingan dan pertentangan
Eksposisi perbandingan dan pertentanganVERGITA HANDOKO
 
Intorudction into VBScript
Intorudction into VBScriptIntorudction into VBScript
Intorudction into VBScriptVitaliy Ganzha
 
Profil negara maju dan berkembang Inggris dan Kenya
Profil negara maju dan berkembang Inggris dan KenyaProfil negara maju dan berkembang Inggris dan Kenya
Profil negara maju dan berkembang Inggris dan KenyaVERGITA HANDOKO
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
 
Traditional shopping vs online shopping
Traditional shopping vs online shopping Traditional shopping vs online shopping
Traditional shopping vs online shopping Syaff Hk
 
PKN Kelas 10 Smt 1 : Hak Asasi Manusia
PKN Kelas 10 Smt 1 : Hak Asasi ManusiaPKN Kelas 10 Smt 1 : Hak Asasi Manusia
PKN Kelas 10 Smt 1 : Hak Asasi ManusiaVERGITA HANDOKO
 
Getting Started with the AAA App
Getting Started with the AAA AppGetting Started with the AAA App
Getting Started with the AAA AppJames Mulkerrins
 

Viewers also liked (19)

CV Al Rawajfeh Sep2016_CV & LoP
CV Al Rawajfeh Sep2016_CV & LoPCV Al Rawajfeh Sep2016_CV & LoP
CV Al Rawajfeh Sep2016_CV & LoP
 
Introduction
IntroductionIntroduction
Introduction
 
Using MongoDB as a high performance graph database
Using MongoDB as a high performance graph databaseUsing MongoDB as a high performance graph database
Using MongoDB as a high performance graph database
 
Issd africa photo competition publish
Issd africa photo competition publishIssd africa photo competition publish
Issd africa photo competition publish
 
Introduction of apache giraph project
Introduction of apache giraph projectIntroduction of apache giraph project
Introduction of apache giraph project
 
Viva las vegas
Viva las vegasViva las vegas
Viva las vegas
 
HP Quick Test Professional
HP Quick Test ProfessionalHP Quick Test Professional
HP Quick Test Professional
 
classroom learning community
classroom learning community classroom learning community
classroom learning community
 
Quality 101: Introduction to Continuous Improvement
Quality 101: Introduction to Continuous ImprovementQuality 101: Introduction to Continuous Improvement
Quality 101: Introduction to Continuous Improvement
 
Po report 4
Po report 4Po report 4
Po report 4
 
Zat Adiktif dan Psikotropika
Zat Adiktif dan PsikotropikaZat Adiktif dan Psikotropika
Zat Adiktif dan Psikotropika
 
Analisis surah al hujurat 10
Analisis surah al hujurat 10Analisis surah al hujurat 10
Analisis surah al hujurat 10
 
Eksposisi perbandingan dan pertentangan
Eksposisi perbandingan dan pertentanganEksposisi perbandingan dan pertentangan
Eksposisi perbandingan dan pertentangan
 
Intorudction into VBScript
Intorudction into VBScriptIntorudction into VBScript
Intorudction into VBScript
 
Profil negara maju dan berkembang Inggris dan Kenya
Profil negara maju dan berkembang Inggris dan KenyaProfil negara maju dan berkembang Inggris dan Kenya
Profil negara maju dan berkembang Inggris dan Kenya
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
Traditional shopping vs online shopping
Traditional shopping vs online shopping Traditional shopping vs online shopping
Traditional shopping vs online shopping
 
PKN Kelas 10 Smt 1 : Hak Asasi Manusia
PKN Kelas 10 Smt 1 : Hak Asasi ManusiaPKN Kelas 10 Smt 1 : Hak Asasi Manusia
PKN Kelas 10 Smt 1 : Hak Asasi Manusia
 
Getting Started with the AAA App
Getting Started with the AAA AppGetting Started with the AAA App
Getting Started with the AAA App
 

Similar to Leveraging MongoDB: An Introductory Case Study

Store
StoreStore
StoreESUG
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Marco Tusa
 
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Amazon Web Services
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2ScribbleLive
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data SafeTony Tam
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 DistilledGrig Gheorghiu
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Replication Solutions for PostgreSQL
Replication Solutions for PostgreSQLReplication Solutions for PostgreSQL
Replication Solutions for PostgreSQLPeter Eisentraut
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale NetApp
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLArnab Biswas
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsAlex Tumanoff
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsJosh Carlisle
 
My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013hernanibf
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation Yahoo Developer Network
 

Similar to Leveraging MongoDB: An Introductory Case Study (20)

Store
StoreStore
Store
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
Developing for Your Target Market - Social, Games & Mobile - AWS India Summit...
 
High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2High Scalability Toronto: Meetup #2
High Scalability Toronto: Meetup #2
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Replication Solutions for PostgreSQL
Replication Solutions for PostgreSQLReplication Solutions for PostgreSQL
Replication Solutions for PostgreSQL
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale Cost Effectively Run Multiple Oracle Database Copies at Scale
Cost Effectively Run Multiple Oracle Database Copies at Scale
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Serialization and performance by Sergey Morenets
Serialization and performance by Sergey MorenetsSerialization and performance by Sergey Morenets
Serialization and performance by Sergey Morenets
 
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless DreamsRainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
Rainbows, Unicorns, and other Fairy Tales in the Land of Serverless Dreams
 
My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013My Site is slow - Drupal Camp London 2013
My Site is slow - Drupal Camp London 2013
 
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Leveraging MongoDB: An Introductory Case Study

  • 1. StudyBlue StudyBlue and MongoDB: Implementation 101 October 18, 2011 StudyBlue, Inc.
  • 2. Overview • Who am I? • Who is StudyBlue? • Why MongoDB? • How did we leverage MongoDB? • What lessons did we learn? • Q&A StudyBlue, Inc.
  • 3. Who am I? • Sean Laurent • sean@studyblue.com • Director of Operations at StudyBlue, Inc. StudyBlue, Inc.
  • 5. About StudyBlue • Bottom-up attempt to improve student outcomes • Online service for storing, studying, sharing and ultimately mastering course material • Digital backpack for students • Freemium business model StudyBlue, Inc.
  • 6. StudyBlue Usage • Many simultaneous users • Rapid growth • Cyclical usage StudyBlue, Inc.
  • 8. Flashcard Scoring • Track flashcard scoring • Every single card • Every single user • Forever • Provide aggregate statistics • Flashcard deck • Folder • Overall • Focus on content mastery StudyBlue, Inc.
  • 10. The Problem • Existing PostgreSQL database • Reasonably large number of cards • Large number of users • Users base increasing rapidly • Shift in usage - increasing faster than users • Time on site • Decks per user • Average deck size • Study sessions per user StudyBlue, Inc.
  • 11. Additional Requirements • Support sustained rapid growth • Highly available • Minimize maintenance costs • Active community • Done yesterday StudyBlue, Inc.
  • 13. Alternatives • Amazon Simple DB • Far too simple • Cassandra • Difficult to add nodes and rebalance • Column families cannot be modified w/out restart • CouchDB • Difficult to add nodes and rebalance • Redis • No native support for sharding/partitioning • Master/slave only - no automatic failover StudyBlue, Inc.
  • 14. MongoDB for the Win • Highly available • Replica sets • Automatic failover • Shards • Works across replica sets • Easy to add additional shards • Node addition • Read performance degradation when adding nodes • “hidden” flag • No down time StudyBlue, Inc.
  • 15. More winning • Atomic insert & replace • Read balancing across slaves • BSON/JSON document model • It just works. Seriously. StudyBlue, Inc.
  • 17. DevOps • Amazon EC2 • Separate dev, test and production environments • Operations testing • Replication • Failover • Scripting & automation • Creation • Cloning StudyBlue, Inc.
  • 18. Development • 100% Java • Existing PostgreSQL database • System of record • Synchronization issues StudyBlue, Inc.
  • 19. SQL Integration & Synchronization • PostgreSQL considered system of record • Asynchronous event driven • Web servers queue change events • Scoring server processes events • Query PostgreSQL • Update MongoDB StudyBlue, Inc.
  • 21. MongoDB Schema • Many shallow collections vs monolithic deep collection • Leverage existing SQL knowledge • Simplify SQL integration StudyBlue, Inc.
  • 22. Schema Design • Two collections used together to map relationships • Folder containing Deck • Decks in a Folder • Decks containing a Card • Cards in a Deck • Folders arranged in tree structure, • One row per folder that points to its parent. • Multiple queries required to build tree • Postgres primary keys are used instead of object ids StudyBlue, Inc.
  • 25. Slave Reads • SlaveOk set to true for most data retrieval • Scoring calculations use Primary to ensure correctness StudyBlue, Inc.
  • 26. Data migration • One-time process • Postgres to MongoDB • Ruby scripts • Separate server StudyBlue, Inc.
  • 28. Summary • Amazon EC2/EBS • Java API • MapReduce • Replication • Partitioning / Shards • Performance StudyBlue, Inc.
  • 29. Amazon EC2 & EBS • Plan for failure • “When” not “if” • EBS performance • Inconsistent • Limited by bandwidth • 60GB minimum • RAID-0 StudyBlue, Inc.
  • 30. Java API • Not perfect • Verbose • Type safety • Failover requires retry • Up to 1 minute delay • Read-only requests • “slaveOk” works • Burden on developer StudyBlue, Inc.
  • 31. Map Reduce • Perfect for aggregation • Not used by StudyBlue • Not needed (yet) • Difficult with multiple collections • Reduce limited to masters • Keep scalability simple • Under consideration StudyBlue, Inc.
  • 32. Replication • Automated failover • Read scaling • Maintenance • Easy setup & configuration • “Seed” node(s) for clients StudyBlue, Inc.
  • 33. Partitioning in the Cloud • Operations perspective • Dynamic changes in machines • Config servers track machines • Each node in replica set knows other nodes • Avoids restarting applications when Mongo servers change • Easy scaling • Local shard servers • Config servers store redundant copies • Two-phase commit StudyBlue, Inc.
  • 34. Useful EC2 Instance Types • Config servers • Mongo replica nodes • t1.micro or m1.small • Depends on memory needs • m2.xlarge, m2.2xlarge, m2.4xlarge or cc1.4xlarge Name Memory CU I/O m2.xlarge 17.1 GB 6.5 (2 cores x 3.25) medium m2.2xlarge 34.2 GB 13 (4 cores x 3.25) high m2.4xlarge 68.4 GB 26 (8 cores x 3.25) high cc1.4xlarge 23 GB 33.5 (2 x Xeon X5570) very high StudyBlue, Inc.
  • 35. Performance Issues • Missing indexes • Performance terrible without indexes • Index on the fly • Store array sizes in collection • OR vs IN • Redundant updates • Events not consolidated StudyBlue, Inc.
  • 37. Key Lessons • Amazon great, but plan for failure • Leverage test platforms • Use replica sets & partitions early • Indexes critical • Use IN instead of OR • Java API cumbersome, but solid • Design schema carefully StudyBlue, Inc.
  • 39. Contact us Web: http://www.studyblue.com Twitter: @StudyBlue Email: sean@studyblue.com StudyBlue, Inc.

Editor's Notes

  1. \n
  2. \n
  3. - Developer at heart\n- 15 years experience\n- Responsible for selecting Mongo\n\n
  4. \n
  5. - Bottom-up attempt to improve student outcomes through disruptive change outside of the education system. \n- Allows students to create and store lecture notes and flashcards and access them online and via mobile apps (iOS and Android)\n
  6. - No public numbers\n- 1000 simultaneous users (peak)\n
  7. \n
  8. \n
  9. \n
  10. - Over 20 million cards now\n- Approx 40 million by Xmas, 80-100 million by May 2012, 200+ million by end 2012\n
  11. \n
  12. \n
  13. \n
  14. \n
  15. - Read balancing (slaveOk) discuss later\n- No downtime with Mongo since launch\n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. - Relationship mapping is example of problem with NoSQL\n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. - Bean serialization\n- Annotations for slaveOk\n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n