5. About StudyBlue
• Bottom-up attempt to improve student
outcomes
• Online service for storing, studying, sharing
and ultimately mastering course material
• Digital backpack for students
• Freemium business model
StudyBlue, Inc.
6. StudyBlue Usage
• Many simultaneous users
• Rapid growth
• Cyclical usage
StudyBlue, Inc.
8. Flashcard Scoring
• Track flashcard scoring
• Every single card
• Every single user
• Forever
• Provide aggregate statistics
• Flashcard deck
• Folder
• Overall
• Focus on content mastery
StudyBlue, Inc.
10. The Problem
• Existing PostgreSQL database
• Reasonably large number of cards
• Large number of users
• Users base increasing rapidly
• Shift in usage - increasing faster than users
• Time on site
• Decks per user
• Average deck size
• Study sessions per user
StudyBlue, Inc.
11. Additional Requirements
• Support sustained rapid growth
• Highly available
• Minimize maintenance costs
• Active community
• Done yesterday
StudyBlue, Inc.
13. Alternatives
• Amazon Simple DB
• Far too simple
• Cassandra
• Difficult to add nodes and rebalance
• Column families cannot be modified w/out restart
• CouchDB
• Difficult to add nodes and rebalance
• Redis
• No native support for sharding/partitioning
• Master/slave only - no automatic failover
StudyBlue, Inc.
14. MongoDB for the Win
• Highly available
• Replica sets
• Automatic failover
• Shards
• Works across replica sets
• Easy to add additional shards
• Node addition
• Read performance degradation when adding nodes
• “hidden” flag
• No down time
StudyBlue, Inc.
15. More winning
• Atomic insert & replace
• Read balancing across slaves
• BSON/JSON document model
• It just works. Seriously.
StudyBlue, Inc.
17. DevOps
• Amazon EC2
• Separate dev, test and production environments
• Operations testing
• Replication
• Failover
• Scripting & automation
• Creation
• Cloning
StudyBlue, Inc.
18. Development
• 100% Java
• Existing PostgreSQL
database
• System of record
• Synchronization issues
StudyBlue, Inc.
19. SQL Integration & Synchronization
• PostgreSQL considered system of record
• Asynchronous event driven
• Web servers queue change events
• Scoring server processes events
• Query PostgreSQL
• Update MongoDB
StudyBlue, Inc.
21. MongoDB Schema
• Many shallow collections vs monolithic deep collection
• Leverage existing SQL knowledge
• Simplify SQL integration
StudyBlue, Inc.
22. Schema Design
• Two collections used together to map relationships
• Folder containing Deck
• Decks in a Folder
• Decks containing a Card
• Cards in a Deck
• Folders arranged in tree structure,
• One row per folder that points to its parent.
• Multiple queries required to build tree
• Postgres primary keys are used instead of object ids
StudyBlue, Inc.
28. Summary
• Amazon EC2/EBS
• Java API
• MapReduce
• Replication
• Partitioning / Shards
• Performance
StudyBlue, Inc.
29. Amazon EC2 & EBS
• Plan for failure
• “When” not “if”
• EBS performance
• Inconsistent
• Limited by bandwidth
• 60GB minimum
• RAID-0
StudyBlue, Inc.
30. Java API
• Not perfect
• Verbose
• Type safety
• Failover requires retry
• Up to 1 minute delay
• Read-only requests
• “slaveOk” works
• Burden on developer
StudyBlue, Inc.
31. Map Reduce
• Perfect for aggregation
• Not used by StudyBlue
• Not needed (yet)
• Difficult with multiple collections
• Reduce limited to masters
• Keep scalability simple
• Under consideration
StudyBlue, Inc.
33. Partitioning in the Cloud
• Operations perspective
• Dynamic changes in machines
• Config servers track machines
• Each node in replica set knows other nodes
• Avoids restarting applications when Mongo servers change
• Easy scaling
• Local shard servers
• Config servers store redundant copies
• Two-phase commit
StudyBlue, Inc.
34. Useful EC2 Instance Types
• Config servers • Mongo replica nodes
• t1.micro or m1.small • Depends on memory needs
• m2.xlarge, m2.2xlarge, m2.4xlarge or
cc1.4xlarge
Name Memory CU I/O
m2.xlarge 17.1 GB 6.5 (2 cores x 3.25) medium
m2.2xlarge 34.2 GB 13 (4 cores x 3.25) high
m2.4xlarge 68.4 GB 26 (8 cores x 3.25) high
cc1.4xlarge 23 GB 33.5 (2 x Xeon X5570) very high
StudyBlue, Inc.
35. Performance Issues
• Missing indexes
• Performance terrible without indexes
• Index on the fly
• Store array sizes in collection
• OR vs IN
• Redundant updates
• Events not consolidated
StudyBlue, Inc.
37. Key Lessons
• Amazon great, but plan for failure
• Leverage test platforms
• Use replica sets & partitions early
• Indexes critical
• Use IN instead of OR
• Java API cumbersome, but solid
• Design schema carefully
StudyBlue, Inc.
- Developer at heart\n- 15 years experience\n- Responsible for selecting Mongo\n\n
\n
- Bottom-up attempt to improve student outcomes through disruptive change outside of the education system. \n- Allows students to create and store lecture notes and flashcards and access them online and via mobile apps (iOS and Android)\n
- No public numbers\n- 1000 simultaneous users (peak)\n
\n
\n
\n
- Over 20 million cards now\n- Approx 40 million by Xmas, 80-100 million by May 2012, 200+ million by end 2012\n
\n
\n
\n
\n
- Read balancing (slaveOk) discuss later\n- No downtime with Mongo since launch\n
\n
\n
\n
\n
\n
\n
- Relationship mapping is example of problem with NoSQL\n