SlideShare a Scribd company logo
1 of 88
Technical Overview
  Paul Pedersen, Deputy CTO
       paul@10gen.com
         Phoenix MUG
        March 27, 2012
RDBMS systems




                Costs go up
Productivity goes down
Project
 Start

          De-
       normalize
      data model

                   Stop using
                      joins     Custom
                                caching   Custom
                                 layer    sharding
MongoDB / NoSQL objective:
                Make it as simple as possible, but no simpler

                               Memcached

                                   Key / Value
Scalability & Performance




                                                                     RDBMS




                                            Depth of functionality
mongodb stores BSON
{
    “_id” : 123456789,
    “name” : { “first”: “mongo”, “last”: “db” },
    “address” : “555 University Ave., Palo Alto CA 94301”,
    “checkins” : [ “starbucks”, “ace hardware”, “evvia” ],
    “loc” : [38.57, 121.62 ]
}                              Types:
                               • unique id
                               • strings, dates, numbers, bool
                               • embedded documents
                               • arrays
                               • regex
                               • binary data
MongoDB architecture

• BSON object store, unique _id
• mmap memory management
• B-Tree indexing
• single-master replication
• logical key-range sharding
• global configuration manager
MongoDB design goals

• agile development
 -   rich data model, well matched to OO data
 -   general-purpose DBMS
 -   simple, but powerful query syntax
 -   multiple secondary indexes, geo indexes
 -   smooth path to horizontal scaling
 -   broad integration into existing ecosystems
MongoDB design goals

• web scale
 -   eventual consistency: single master
 -   availability: non-blocking writes, journal-
     blocking writes, replication-blocking writes
 -   durability: journaling
 -   partition: multi-data center replication
 -   easy horizontal scaling by data sharding
MongoDB design goals

• High performance. For example:
   “Last week we learned about a government agency using
   mongodb to process about 2 million transactions per
   second. Around 50 machines in each of 8 data centers..”
   Wordnik stores its entire text corpus in MongoDB - 3.5T
   of data in 20 billion records. The speed to query the corpus
   was cut to 1/4 the time it took prior to migrating to
   MongoDB.
MongoDB scale

• We are running instances on the order of:
 -   25B objects
 -   50TB storage
 -   50K qps per server
 -   500 servers
SCHEMA DESIGN
Schema design checklist
1. When do we embed data versus linking? Our decisions
here imply the answer to question 2:
2. How many collections do we have, and what are they?
3. When do we need atomic operations? These operations
can be performed within the scope of a BSON document,
but not across documents.
4. What indexes will we create to make query and updates
fast?
5. How will we shard? What is the shard key?
Embedding v. linking

Embedding is the nesting of objects and arrays inside a
BSON document.

Links are references between documents (by _id)

There are no joins in MongoDB. (Distributed joins are
inherently expensive.)

Embedding is a bit like "prejoined" data.
Embedding v. linking

Operations within a document are easy for the server to
handle. These operations can be fairly rich.

Links in contrast must be processed client-side by the
application issuing a follow-up query.

For "contains" relationships between entities, embedding
should be be chosen.

Use linking when not using linking would result in gross
duplication of data.
Embedding v. linking

Embedded objects are a bit harder to link to than "top level"
objects in collections. (http://www.mongodb.org/display/DOCS/Dot+Notation
+(Reaching+into+Objects))


If the amount of data to embed is large, you may reach the
limit on size of a single object. Then consider using GridFS.

If performance is an issue, embed.
Example


Our content management system will manage posts.

Posts have titles, dates, and authors.

We'd like to support commenting and voting on posts.

We'd also like posts to be taggable for searching.
Example
> db.posts.findOne()
{
  _id : ObjectId("4e77bb3b8a3e000000004f7a"),
  when : Date("2011-09-19T02:10:11.3Z",
  author : "alex",
  title : "No Free Lunch",
  text : "This is the text of the post. It could be very long.",
  tags : [ "business", "ramblings" ],
  votes : 5,
  voters : [ "jane", "joe", "spencer", "phyllis", "li" ],
  comments : [
    { who : "jane", when : Date("2011-09-19T04:00:10.112Z"),
      comment : "I agree." },
    { who : "meghan", when : Date("2011-09-20T14:36:06.958Z"),
      comment : "You must be joking. etc etc ..." }
  ]
}
Disk seeks and data locality
Seek = 5+ ms          Read = really really fast




               Post

                            Comment
 Author
Disk seeks and data locality
          Post


       Author



       Comment
       Comment
        Comment
         Comment
         Comment
Collections


Collections of documents in MongoDB are analogous to
tables of rows in a relational database.

There is no explicit schema declaration. No schema object.

There is a conceptual design of how the documents in the
collection will be structured.
Collections

MongoDB does not require documents in a collection to
have the same structure.

In practice, most collections are relatively homogenous.

We can move away from this when we need -- for example
when adding a new field. No analog of "alter table" is
required.
Atomicity
Some problems require the ability to perform atomic
operations. For example:

     atomically {
       if (doc.credits > 5) {
         doc.credits -= 5;
         doc.debits += 5;
       }
     }

The scope of atomicity / ACID properties is the document.

We need to assure that all fields relevant to the atomic
operation are in the same document.
Indexes


Indexes in MongoDB are highly analogous to indexes in a
relational database. (B-Tree indexes)
Indexes are needed for efficient query processing: both for
selection and sorting.
Indexes must be explicitly declared.
As in a relational database, indexes can be added to existing
collections.
Choosing indexes
As a general rule -- where you want an index in a relational
database, you want an index in MongoDB.
•The _id field is automatically indexed.
• Fields whose keys are looked up should be indexed.
• Sort fields generally should be indexed.

The MongoDB profiling facility provides useful information
for where an index should be added that is missing.
Adding an index slows writes to a collection, but not reads.
Example
To grab the whole post we might execute:
  > db.posts.findOne( { _id :
  ObjectId("4e77bb3b8a3e000000004f7a") } );

To get all posts written by alex:
  > db.posts.find( { author : "alex" } )

If the above is a common query we would create an index on the author
field:
  > db.posts.ensureIndex( { author : 1 } )

To get just the titles of the posts by alex:
  > db.posts.find( { author : "alex" }, { title : 1 } )
Example

We may want to search for posts by tag:
  > // make and index of all tags so that the query is fast:
  > db.posts.ensureIndex( { tags : 1 } )
  > db.posts.find( { tags : "business" } )

What if we want to find all posts commented on by meghan?
  > db.posts.find( { comments.who : "meghan" } )

Let's index that to make it fast:
  > db.posts.ensureIndex( { "comments.who" : 1 } )
Example
We track voters above so that no one can vote more than once.
Suppose calvin wants to vote for the example post above.
The following update operation will record calvin's vote. Because of the
$nin sub-expression, if calvin has already voted, the update will have no
effect.
  > db.posts.update( { _id : ObjectId("4e77bb3b8a3e000000004f7a"),
                       voters : { $nin : "calvin" } },
                     { votes : { $inc : 1 }, voters : { $push :
  "calvin" } );

Note the above operation is atomic : if multiple users vote
simultaneously, no votes would be lost.
Sharding
A collection may be sharded (i.e. partitioned).

When sharded, the collection has a shard key, which
determines how the collection is partitioned among shards.

A BSON document (which may have significant amounts of
embedding) resides on one and only one shard.

Typically (but not always) queries on a sharded collection
involve the shard key as part of the query expression.

Changing shard keys is difficult. You will want to choose the
right shard key from the start.
Sharding
                                         mongos
                                                                                     config
                                         balancer
                                                                                     config
Chunks = logical
  key range
                                                                                     config




     1    2    3    4    13    14   15   16         25    26   27   28   37    38   39   40

     5    6    7    8    17    18   19   20         29    30   31   32   41    42   43   44

     9    10   11   12   21    22   23   24         33    34   35   36   45    46   47   48


         Shard 1              Shard 2                    Shard 3              Shard 4
Example

If the posts collection is small, we would not need to shard it at
all. But if posts is huge, we probably want to shard it.

The shard key should be chosen based on the queries that will be
common. We want those queries to involve the shard key.
  • Sharding by _id is one option here.
  • If finding the most recent posts is a very frequent query, we
    would then shard on the when field.
Shard key best practices
It is very important that the shard key is granular enough that
data can be partitioned among many machines.

One of the primary reasons for using sharding is to distribute
writes. To do this, its important that writes hit as many chunks
as possible.

Another key consideration is how many shards any query has
to hit. Ideally a query would go directly from mongos to the
shard (mongod) that owns the document requested.
Shard key best practices
A query including a sort is sent to the same shards that would have
received the equivalent query without the sort expression. Each such
shard performs a sort on its subset of the data. Mongos merges the
ordered .

it is usually much better performance-wise if only a portion of the
index is read/updated, so that this "active" portion can stay in RAM
most of the time.

Most shard keys described above result in an even spread over the
shards but also within each mongod's index. It can be beneficial to
factor in some kind of timestamp at the beginning of the shard key in
order to limit the portion of the index that gets hit.
Example
Say you have a photo storage system, with photo entries:
  {
      _id: ObjectId("4d084f78a4c8707815a601d7"),
      user_id : 42 ,
      title: "sunset at the beach",
      upload_time : "2011-01-02T21:21:56.249Z" ,
      data: ...,
  }
You could instead build a custom id that includes the month of the upload
time, and a unique identifier (ObjectId, MD5 of data, etc):
  {
      _id: "2011-01_4d084f78a4c8707815a601d7",
      user_id : 42 ,
      title: "sunset at the beach",
      upload_time : "2011-01-02T21:21:56.249Z" ,
      data: ...,
  }
This custom id would be the key used for sharding, and also the id used to
retrieve a document. It gives both a good spread across all servers, and it
reduces the portion of the index hit on most queries.
SHARDING: HOW IT WORKS
Keys
> db.runCommand( { shardcollection: “test.users”,
                   key: { email: 1 }} )



      {
          name: “Jared”,
          email: “jsr@10gen.com”,
      }
      {
          name: “Scott”,
          email: “scott@10gen.com”,
      }
      {
          name: “Dan”,
          email: “dan@10gen.com”,
      }
Chunks


-∞            +∞
Chunks


-∞                                              +∞




     dan@10gen.com            scott@10gen.com


              jsr@10gen.com
Chunks
                     Split!



-∞                                              +∞




     dan@10gen.com            scott@10gen.com


              jsr@10gen.com
Chunks
     This is a             Split!          This is a
      chunk                                 chunk


-∞                                                     +∞




           dan@10gen.com            scott@10gen.com


                    jsr@10gen.com
Chunks


-∞                                              +∞




     dan@10gen.com            scott@10gen.com


              jsr@10gen.com
Chunks
      Split!



-∞                                               +∞




     dan@10gen.com             scott@10gen.com


               jsr@10gen.com
Chunks
Min Key           Max Key           Shard
-∞                adam@10gen.com    1
adam@10gen.com    jared@10gen.com   1
jared@10gen.com   scott@10gen.com   1
scott@10gen.com   +∞                1


• Stored in the config servers
• Cached in mongos
• Used to route requests and keep cluster
  balanced
Balancing
                                      mongos
                                                                                  config
                                      balancer
                                                                                  config
Chunks!
                                                                                  config




  1    2    3    4    13    14   15   16         25    26   27   28   37    38   39   40

  5    6    7    8    17    18   19   20         29    30   31   32   41    42   43   44

  9    10   11   12   21    22   23   24         33    34   35   36   45    46   47   48


      Shard 1              Shard 2                    Shard 3              Shard 4
Balancing
                                      mongos
                                                                                  config
                                      balancer
                                                                                  config


                    Imbalance
                     Imbalance                                                    config




1    2    3    4

5    6    7    8

9    10   11   12     21    22   23   24         33    34   35   36   45    46   47   48


    Shard 1                Shard 2                    Shard 3              Shard 4
Balancing
                                    mongos
                                                                                config
                                    balancer
                                                                                config

                           Move chunk 1 to                                      config
                           Shard 2




1    2    3    4

5    6    7    8

9    10   11   12   21    22   23   24         33    34   35   36   45    46   47   48


    Shard 1              Shard 2                    Shard 3              Shard 4
Balancing
                                    mongos
                                                                                config
                                    balancer
                                                                                config

                                                                                config




1    2    3    4

5    6    7    8

9    10   11   12   21    22   23   24         33    34   35   36   45    46   47   48


    Shard 1              Shard 2                    Shard 3              Shard 4
Balancing
                                    mongos
                                                                                config
                                    balancer
                                                                                config

                                                                                config




     2    3    4

5    6    7    8    1

9    10   11   12   21    22   23   24         33    34   35   36   45    46   47   48


    Shard 1              Shard 2                    Shard 3              Shard 4
Balancing
                                    mongos
                                                                                config
                                    balancer
                                                                                config

                                                                                config




          3    4

5    6    7    8    1                          2

9    10   11   12   21    22   23   24         33    34   35   36   45    46   47   48


    Shard 1              Shard 2                    Shard 3              Shard 4
Balancing
                                    mongos
                                                                                config
                                    balancer
                                                                                config

                                                                                config




               4

5    6    7    8    1                          2                    3

9    10   11   12   21    22   23   24         33    34   35   36   45    46   47   48


    Shard 1              Shard 2                    Shard 3              Shard 4
Balancing
                                    mongos
                                                                                config
                                    balancer
                                                                                config
                                         Chunks 1,2, and 3
                                          have migrated
                                                                                config




               4

5    6    7    8    1                          2                    3

9    10   11   12   21    22   23   24         33    34   35   36   45    46   47   48


    Shard 1              Shard 2                    Shard 3              Shard 4
CHOOSING A SHARD KEY
ROUTING
Routed Request
                         1

                                        1. Query  arrives at
                          4
                     mongos                mongos
                                        2. mongos routes
                                           query to a single
                                           shard
                     2
                                        3. Shard returns

                     3                     results of query
                                        4. Results returned to
                                           client

Shard 1    Shard 2            Shard 3
Scatter Gather
                            1

                                4                 1. Query   arrives at
                        mongos                       mongos
                                                  2. mongos
                                                     broadcasts query
                                                     to all shards
          2                                       3. Each shard returns
                    2               2
                                         3           results for query
          3             3
                                                  4. Results combined
                                                     and returned to
                                                     client
Shard 1       Shard 2                   Shard 3
Distributed Merge Sort
                            1
                                                  1. Query   arrives at
                             6
                        mongos                       mongos
                                                  2. mongos broadcasts
                            5
                                                     query to all shards
                                                  3. Each shard locally
          2                                          sorts results
                    2           2                 4. Results returned to
          4                          4
                        4                            mongos
                                                  5. mongos merge sorts
3             3                               3      individual results
Shard 1       Shard 2               Shard 3       6. Combined sorted
                                                     result returned to
                                                     client
Writes

Inserts   Requires shard   db.users.insert({
          key                name: “Jared”,
                             email: “jsr@10gen.com”})

Removes   Routed           db.users.delete({
                             email: “jsr@10gen.com”})

          Scattered        db.users.delete({name: “Jared”})


Updates   Routed           db.users.update(
                             {email: “jsr@10gen.com”},
          Scattered        db.users.update( “CA”}})
                             {$set: { state:
                             {state: “FZ”},
                             {$set:{ state: “CA”}} )
Queries

By Shard      Routed             db.users.find(
                                   {email: “jsr@10gen.com”})
Key

Sorted by     Routed in order    db.users.find().sort({email:-1})
shard key
Find by non   Scatter Gather     db.users.find({state:”CA”})
shard key
Sorted by     Distributed merge db.users.find().sort({state:1})
              sort
non shard
key
REPLICATION: HOW IT WORKS
Replica Sets

         Write
                    Primary
         Read
                                Asynchronous
Driver




                   Secondary    Replication
         Read


                   Secondary
         Read
Replica Sets

                   Primary
Driver




                  Secondary
         Read


                  Secondary
         Read
Replica Sets

                    Primary

         Write                 Automatic
Driver




                    Primary    Leader Election
         Read

                   Secondary
         Read
Replica Sets

                   Secondary
         Read

         Write
Driver




                    Primary
         Read

                   Secondary
         Read
CONSISTENCY AND
DURABILITY
Strong Consistency

         Write
                  Primary
Driver




         Read

                 Secondary


                 Secondary
Eventual Consistency

         Write
                  Primary
Driver




                 Secondary
         Read


                 Secondary
          Read
Durability
• Fire and forget
• Wait for error
• Wait for fsync
• Wait for journal sync
• Wait for replication
Fire and forget
Driver            Primary
         write

                            apply in memory
Get last error
Driver              Primary
         write
     getLastError
                              apply in memory
Wait for Journal Sync
Driver              Primary
         write
     getLastError
                              apply in memory
         j:true
                              Write to journal
Wait for fsync
Driver                Primary
           write
     getLastError
                                apply in memory
         fsync:true


                                fsync
Wait for replication
Driver              Primary                     Secondary
         write
     getLastError
                              apply in memory
         w:2
                                  replicate
Write Concern Options
Value          Meaning

<n:integer>    Replicate to N members of replica set

“majority”     Replicate to a majority of replica set
               members
<m:modeName>   Use custom error mode name
Tagging
{ _id: “someSet”,
   members: [
     { _id:0, host:”A”, tags: { dc: “ny”}},
     { _id:1, host:”B”, tags: { dc: “ny”}},
     { _id:2, host:”C”, tags: { dc: “sf”}},
     { _id:3, host:”D”, tags: { dc: “sf”}},
     { _id:4, host:”E”, tags: { dc: “cloud”}},
   settings: {
       getLastErrorModes: {
           veryImportant: { dc: 3 },
           sortOfImportant: { dc: 2 }
       }
   }
 }
Tagging
              { _id: “someSet”,
                 members: [
                   { _id:0, host:”A”, tags: { dc: “ny”}},
                   { _id:1, host:”B”, tags: { dc: “ny”}},
                   { _id:2, host:”C”, tags: { dc: “sf”}},
                   { _id:3, host:”D”, tags: { dc: “sf”}},
                   { _id:4, host:”E”, tags: { dc: “cloud”}},
                 settings: {
                     getLastErrorModes: {
These are the            veryImportant: { dc: 3 },
modes you can            sortOfImportant: { dc: 2 }
 use in write        }
  concern
                 }
               }
REPLICA SET OPTIONS
Priorities
• Between 0..1000
• Highest member that is up to date wins
  –Up to date == within 10 seconds of primary
• If a higher priority member catches up, it will
  force election and win


          Primary            Secondary       Secondary
          priority = 100     priority = 50   priority = 20
Slave Delay
• Lags behind master by configurable time delay
• Automatically hidden from clients
• Protects against operator errors
 –Accidentally delete database
 –Application corrupts data
Arbiters
• Vote in elections
• Don’t store a copy of data
• Use as tie breaker
COMMON DEPLOYMENT
SCENARIOS
Typical

           Data Center




Primary   Secondary      Secondary
Backup Node

           Data Center




Primary   Secondary      Secondary
                         hidden = true




                            backups
Disaster Recovery

       Active Data Center          Standby Data Center




Primary            Secondary       Secondary
priority = 1        priority = 1
Multi Data Center
West Coast DC      Central DC    East Coast DC




Secondary         Primary        Secondary
                  priority = 1
mongo sharding
Quick Release History

•   1.0 - August 2009: bson, BTree range query optimization

•   1.2 - December 2009: map-reduce

•   1.4 - March 2010: background indexing, geo indexes

•   1.6 - August 2010: sharding, replica sets

•   1.8 - March 2011: journaling, sparse and covered indexes

•   1.10 = 2.0 - Sep 2011: cumulative performance improvements
Roadmap
• v2.2 Projected 2012 Q1
 • Concurrency: yielding and db or
   collection-level locking
 • Improved free list implementation
 • New aggregation framework
 • TTL Collections
• Hash shard key
Roadmap: short List
• Full text Search
• Online compaction
• Internal compression
• Read tagging
• XML / JSON transcoder
 Vote: http://jira.mongodb.org/
Thank you

More Related Content

What's hot

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaMongoDB
 
High Performance Applications with MongoDB
High Performance Applications with MongoDBHigh Performance Applications with MongoDB
High Performance Applications with MongoDBMongoDB
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDBMongoDB
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceMongoDB
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDBMongoDB
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to ShardingMongoDB
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB
 
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...MongoDB
 
introtomongodb
introtomongodbintrotomongodb
introtomongodbsaikiran
 
MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?Mydbops
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 

What's hot (20)

OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
 
High Performance Applications with MongoDB
High Performance Applications with MongoDBHigh Performance Applications with MongoDB
High Performance Applications with MongoDB
 
Managing Social Content with MongoDB
Managing Social Content with MongoDBManaging Social Content with MongoDB
Managing Social Content with MongoDB
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Mongodb
MongodbMongodb
Mongodb
 
MongoDB and Schema Design
MongoDB and Schema DesignMongoDB and Schema Design
MongoDB and Schema Design
 
Mongo db queries
Mongo db queriesMongo db queries
Mongo db queries
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
 
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
MongoDB London 2013: Data Modeling Examples from the Real World presented by ...
 
introtomongodb
introtomongodbintrotomongodb
introtomongodb
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
 
MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 

Similar to 2012 phoenix mug

No SQL - MongoDB
No SQL - MongoDBNo SQL - MongoDB
No SQL - MongoDBMirza Asif
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMohan Rathour
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB Habilelabs
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
mongodb11 (1) (1).pptx
mongodb11 (1) (1).pptxmongodb11 (1) (1).pptx
mongodb11 (1) (1).pptxRoopaR36
 
How to use MongoDB with CakePHP
How to use MongoDB with CakePHPHow to use MongoDB with CakePHP
How to use MongoDB with CakePHPichikaway
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlTO THE NEW | Technology
 
Mongo db tutorials
Mongo db tutorialsMongo db tutorials
Mongo db tutorialsAnuj Jain
 
DBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsDBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsSrinivas Mutyala
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaperRajesh Kumar
 
MongoDB at FrozenRails
MongoDB at FrozenRailsMongoDB at FrozenRails
MongoDB at FrozenRailsMike Dirolf
 

Similar to 2012 phoenix mug (20)

Mongo db
Mongo dbMongo db
Mongo db
 
No SQL - MongoDB
No SQL - MongoDBNo SQL - MongoDB
No SQL - MongoDB
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
MongoDB
MongoDBMongoDB
MongoDB
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
lecture_34e.pptx
lecture_34e.pptxlecture_34e.pptx
lecture_34e.pptx
 
No sql - { If and Else }
No sql - { If and Else }No sql - { If and Else }
No sql - { If and Else }
 
mongodb11 (1) (1).pptx
mongodb11 (1) (1).pptxmongodb11 (1) (1).pptx
mongodb11 (1) (1).pptx
 
MongoDB
MongoDBMongoDB
MongoDB
 
How to use MongoDB with CakePHP
How to use MongoDB with CakePHPHow to use MongoDB with CakePHP
How to use MongoDB with CakePHP
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
 
Mongo db tutorials
Mongo db tutorialsMongo db tutorials
Mongo db tutorials
 
DBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training PresentationsDBVersity MongoDB Online Training Presentations
DBVersity MongoDB Online Training Presentations
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Mongodb Introduction
Mongodb IntroductionMongodb Introduction
Mongodb Introduction
 
MongoDB NoSQL database a deep dive -MyWhitePaper
MongoDB  NoSQL database a deep dive -MyWhitePaperMongoDB  NoSQL database a deep dive -MyWhitePaper
MongoDB NoSQL database a deep dive -MyWhitePaper
 
MongoDB at FrozenRails
MongoDB at FrozenRailsMongoDB at FrozenRails
MongoDB at FrozenRails
 
MongoDb - Details on the POC
MongoDb - Details on the POCMongoDb - Details on the POC
MongoDb - Details on the POC
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

2012 phoenix mug

  • 1. Technical Overview Paul Pedersen, Deputy CTO paul@10gen.com Phoenix MUG March 27, 2012
  • 2. RDBMS systems Costs go up
  • 3. Productivity goes down Project Start De- normalize data model Stop using joins Custom caching Custom layer sharding
  • 4. MongoDB / NoSQL objective: Make it as simple as possible, but no simpler Memcached Key / Value Scalability & Performance RDBMS Depth of functionality
  • 5. mongodb stores BSON { “_id” : 123456789, “name” : { “first”: “mongo”, “last”: “db” }, “address” : “555 University Ave., Palo Alto CA 94301”, “checkins” : [ “starbucks”, “ace hardware”, “evvia” ], “loc” : [38.57, 121.62 ] } Types: • unique id • strings, dates, numbers, bool • embedded documents • arrays • regex • binary data
  • 6. MongoDB architecture • BSON object store, unique _id • mmap memory management • B-Tree indexing • single-master replication • logical key-range sharding • global configuration manager
  • 7. MongoDB design goals • agile development - rich data model, well matched to OO data - general-purpose DBMS - simple, but powerful query syntax - multiple secondary indexes, geo indexes - smooth path to horizontal scaling - broad integration into existing ecosystems
  • 8. MongoDB design goals • web scale - eventual consistency: single master - availability: non-blocking writes, journal- blocking writes, replication-blocking writes - durability: journaling - partition: multi-data center replication - easy horizontal scaling by data sharding
  • 9. MongoDB design goals • High performance. For example: “Last week we learned about a government agency using mongodb to process about 2 million transactions per second. Around 50 machines in each of 8 data centers..” Wordnik stores its entire text corpus in MongoDB - 3.5T of data in 20 billion records. The speed to query the corpus was cut to 1/4 the time it took prior to migrating to MongoDB.
  • 10. MongoDB scale • We are running instances on the order of: - 25B objects - 50TB storage - 50K qps per server - 500 servers
  • 12. Schema design checklist 1. When do we embed data versus linking? Our decisions here imply the answer to question 2: 2. How many collections do we have, and what are they? 3. When do we need atomic operations? These operations can be performed within the scope of a BSON document, but not across documents. 4. What indexes will we create to make query and updates fast? 5. How will we shard? What is the shard key?
  • 13. Embedding v. linking Embedding is the nesting of objects and arrays inside a BSON document. Links are references between documents (by _id) There are no joins in MongoDB. (Distributed joins are inherently expensive.) Embedding is a bit like "prejoined" data.
  • 14. Embedding v. linking Operations within a document are easy for the server to handle. These operations can be fairly rich. Links in contrast must be processed client-side by the application issuing a follow-up query. For "contains" relationships between entities, embedding should be be chosen. Use linking when not using linking would result in gross duplication of data.
  • 15. Embedding v. linking Embedded objects are a bit harder to link to than "top level" objects in collections. (http://www.mongodb.org/display/DOCS/Dot+Notation +(Reaching+into+Objects)) If the amount of data to embed is large, you may reach the limit on size of a single object. Then consider using GridFS. If performance is an issue, embed.
  • 16. Example Our content management system will manage posts. Posts have titles, dates, and authors. We'd like to support commenting and voting on posts. We'd also like posts to be taggable for searching.
  • 17. Example > db.posts.findOne() { _id : ObjectId("4e77bb3b8a3e000000004f7a"), when : Date("2011-09-19T02:10:11.3Z", author : "alex", title : "No Free Lunch", text : "This is the text of the post. It could be very long.", tags : [ "business", "ramblings" ], votes : 5, voters : [ "jane", "joe", "spencer", "phyllis", "li" ], comments : [ { who : "jane", when : Date("2011-09-19T04:00:10.112Z"), comment : "I agree." }, { who : "meghan", when : Date("2011-09-20T14:36:06.958Z"), comment : "You must be joking. etc etc ..." } ] }
  • 18. Disk seeks and data locality Seek = 5+ ms Read = really really fast Post Comment Author
  • 19. Disk seeks and data locality Post Author Comment Comment Comment Comment Comment
  • 20. Collections Collections of documents in MongoDB are analogous to tables of rows in a relational database. There is no explicit schema declaration. No schema object. There is a conceptual design of how the documents in the collection will be structured.
  • 21. Collections MongoDB does not require documents in a collection to have the same structure. In practice, most collections are relatively homogenous. We can move away from this when we need -- for example when adding a new field. No analog of "alter table" is required.
  • 22. Atomicity Some problems require the ability to perform atomic operations. For example: atomically { if (doc.credits > 5) { doc.credits -= 5; doc.debits += 5; } } The scope of atomicity / ACID properties is the document. We need to assure that all fields relevant to the atomic operation are in the same document.
  • 23. Indexes Indexes in MongoDB are highly analogous to indexes in a relational database. (B-Tree indexes) Indexes are needed for efficient query processing: both for selection and sorting. Indexes must be explicitly declared. As in a relational database, indexes can be added to existing collections.
  • 24. Choosing indexes As a general rule -- where you want an index in a relational database, you want an index in MongoDB. •The _id field is automatically indexed. • Fields whose keys are looked up should be indexed. • Sort fields generally should be indexed. The MongoDB profiling facility provides useful information for where an index should be added that is missing. Adding an index slows writes to a collection, but not reads.
  • 25. Example To grab the whole post we might execute: > db.posts.findOne( { _id : ObjectId("4e77bb3b8a3e000000004f7a") } ); To get all posts written by alex: > db.posts.find( { author : "alex" } ) If the above is a common query we would create an index on the author field: > db.posts.ensureIndex( { author : 1 } ) To get just the titles of the posts by alex: > db.posts.find( { author : "alex" }, { title : 1 } )
  • 26. Example We may want to search for posts by tag: > // make and index of all tags so that the query is fast: > db.posts.ensureIndex( { tags : 1 } ) > db.posts.find( { tags : "business" } ) What if we want to find all posts commented on by meghan? > db.posts.find( { comments.who : "meghan" } ) Let's index that to make it fast: > db.posts.ensureIndex( { "comments.who" : 1 } )
  • 27. Example We track voters above so that no one can vote more than once. Suppose calvin wants to vote for the example post above. The following update operation will record calvin's vote. Because of the $nin sub-expression, if calvin has already voted, the update will have no effect. > db.posts.update( { _id : ObjectId("4e77bb3b8a3e000000004f7a"), voters : { $nin : "calvin" } }, { votes : { $inc : 1 }, voters : { $push : "calvin" } ); Note the above operation is atomic : if multiple users vote simultaneously, no votes would be lost.
  • 28. Sharding A collection may be sharded (i.e. partitioned). When sharded, the collection has a shard key, which determines how the collection is partitioned among shards. A BSON document (which may have significant amounts of embedding) resides on one and only one shard. Typically (but not always) queries on a sharded collection involve the shard key as part of the query expression. Changing shard keys is difficult. You will want to choose the right shard key from the start.
  • 29. Sharding mongos config balancer config Chunks = logical key range config 1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40 5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
  • 30. Example If the posts collection is small, we would not need to shard it at all. But if posts is huge, we probably want to shard it. The shard key should be chosen based on the queries that will be common. We want those queries to involve the shard key. • Sharding by _id is one option here. • If finding the most recent posts is a very frequent query, we would then shard on the when field.
  • 31. Shard key best practices It is very important that the shard key is granular enough that data can be partitioned among many machines. One of the primary reasons for using sharding is to distribute writes. To do this, its important that writes hit as many chunks as possible. Another key consideration is how many shards any query has to hit. Ideally a query would go directly from mongos to the shard (mongod) that owns the document requested.
  • 32. Shard key best practices A query including a sort is sent to the same shards that would have received the equivalent query without the sort expression. Each such shard performs a sort on its subset of the data. Mongos merges the ordered . it is usually much better performance-wise if only a portion of the index is read/updated, so that this "active" portion can stay in RAM most of the time. Most shard keys described above result in an even spread over the shards but also within each mongod's index. It can be beneficial to factor in some kind of timestamp at the beginning of the shard key in order to limit the portion of the index that gets hit.
  • 33. Example Say you have a photo storage system, with photo entries: { _id: ObjectId("4d084f78a4c8707815a601d7"), user_id : 42 , title: "sunset at the beach", upload_time : "2011-01-02T21:21:56.249Z" , data: ..., } You could instead build a custom id that includes the month of the upload time, and a unique identifier (ObjectId, MD5 of data, etc): { _id: "2011-01_4d084f78a4c8707815a601d7", user_id : 42 , title: "sunset at the beach", upload_time : "2011-01-02T21:21:56.249Z" , data: ..., } This custom id would be the key used for sharding, and also the id used to retrieve a document. It gives both a good spread across all servers, and it reduces the portion of the index hit on most queries.
  • 35. Keys > db.runCommand( { shardcollection: “test.users”, key: { email: 1 }} ) { name: “Jared”, email: “jsr@10gen.com”, } { name: “Scott”, email: “scott@10gen.com”, } { name: “Dan”, email: “dan@10gen.com”, }
  • 36. Chunks -∞ +∞
  • 37. Chunks -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com
  • 38. Chunks Split! -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com
  • 39. Chunks This is a Split! This is a chunk chunk -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com
  • 40. Chunks -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com
  • 41. Chunks Split! -∞ +∞ dan@10gen.com scott@10gen.com jsr@10gen.com
  • 42. Chunks Min Key Max Key Shard -∞ adam@10gen.com 1 adam@10gen.com jared@10gen.com 1 jared@10gen.com scott@10gen.com 1 scott@10gen.com +∞ 1 • Stored in the config servers • Cached in mongos • Used to route requests and keep cluster balanced
  • 43. Balancing mongos config balancer config Chunks! config 1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40 5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
  • 44. Balancing mongos config balancer config Imbalance Imbalance config 1 2 3 4 5 6 7 8 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
  • 45. Balancing mongos config balancer config Move chunk 1 to config Shard 2 1 2 3 4 5 6 7 8 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
  • 46. Balancing mongos config balancer config config 1 2 3 4 5 6 7 8 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
  • 47. Balancing mongos config balancer config config 2 3 4 5 6 7 8 1 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
  • 48. Balancing mongos config balancer config config 3 4 5 6 7 8 1 2 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
  • 49. Balancing mongos config balancer config config 4 5 6 7 8 1 2 3 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
  • 50. Balancing mongos config balancer config Chunks 1,2, and 3 have migrated config 4 5 6 7 8 1 2 3 9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48 Shard 1 Shard 2 Shard 3 Shard 4
  • 53. Routed Request 1 1. Query arrives at 4 mongos mongos 2. mongos routes query to a single shard 2 3. Shard returns 3 results of query 4. Results returned to client Shard 1 Shard 2 Shard 3
  • 54. Scatter Gather 1 4 1. Query arrives at mongos mongos 2. mongos broadcasts query to all shards 2 3. Each shard returns 2 2 3 results for query 3 3 4. Results combined and returned to client Shard 1 Shard 2 Shard 3
  • 55. Distributed Merge Sort 1 1. Query arrives at 6 mongos mongos 2. mongos broadcasts 5 query to all shards 3. Each shard locally 2 sorts results 2 2 4. Results returned to 4 4 4 mongos 5. mongos merge sorts 3 3 3 individual results Shard 1 Shard 2 Shard 3 6. Combined sorted result returned to client
  • 56. Writes Inserts Requires shard db.users.insert({ key name: “Jared”, email: “jsr@10gen.com”}) Removes Routed db.users.delete({ email: “jsr@10gen.com”}) Scattered db.users.delete({name: “Jared”}) Updates Routed db.users.update( {email: “jsr@10gen.com”}, Scattered db.users.update( “CA”}}) {$set: { state: {state: “FZ”}, {$set:{ state: “CA”}} )
  • 57. Queries By Shard Routed db.users.find( {email: “jsr@10gen.com”}) Key Sorted by Routed in order db.users.find().sort({email:-1}) shard key Find by non Scatter Gather db.users.find({state:”CA”}) shard key Sorted by Distributed merge db.users.find().sort({state:1}) sort non shard key
  • 59. Replica Sets Write Primary Read Asynchronous Driver Secondary Replication Read Secondary Read
  • 60. Replica Sets Primary Driver Secondary Read Secondary Read
  • 61. Replica Sets Primary Write Automatic Driver Primary Leader Election Read Secondary Read
  • 62. Replica Sets Secondary Read Write Driver Primary Read Secondary Read
  • 64. Strong Consistency Write Primary Driver Read Secondary Secondary
  • 65. Eventual Consistency Write Primary Driver Secondary Read Secondary Read
  • 66. Durability • Fire and forget • Wait for error • Wait for fsync • Wait for journal sync • Wait for replication
  • 67. Fire and forget Driver Primary write apply in memory
  • 68. Get last error Driver Primary write getLastError apply in memory
  • 69. Wait for Journal Sync Driver Primary write getLastError apply in memory j:true Write to journal
  • 70. Wait for fsync Driver Primary write getLastError apply in memory fsync:true fsync
  • 71. Wait for replication Driver Primary Secondary write getLastError apply in memory w:2 replicate
  • 72. Write Concern Options Value Meaning <n:integer> Replicate to N members of replica set “majority” Replicate to a majority of replica set members <m:modeName> Use custom error mode name
  • 73. Tagging { _id: “someSet”, members: [ { _id:0, host:”A”, tags: { dc: “ny”}}, { _id:1, host:”B”, tags: { dc: “ny”}}, { _id:2, host:”C”, tags: { dc: “sf”}}, { _id:3, host:”D”, tags: { dc: “sf”}}, { _id:4, host:”E”, tags: { dc: “cloud”}}, settings: { getLastErrorModes: { veryImportant: { dc: 3 }, sortOfImportant: { dc: 2 } } } }
  • 74. Tagging { _id: “someSet”, members: [ { _id:0, host:”A”, tags: { dc: “ny”}}, { _id:1, host:”B”, tags: { dc: “ny”}}, { _id:2, host:”C”, tags: { dc: “sf”}}, { _id:3, host:”D”, tags: { dc: “sf”}}, { _id:4, host:”E”, tags: { dc: “cloud”}}, settings: { getLastErrorModes: { These are the veryImportant: { dc: 3 }, modes you can sortOfImportant: { dc: 2 } use in write } concern } }
  • 76. Priorities • Between 0..1000 • Highest member that is up to date wins –Up to date == within 10 seconds of primary • If a higher priority member catches up, it will force election and win Primary Secondary Secondary priority = 100 priority = 50 priority = 20
  • 77. Slave Delay • Lags behind master by configurable time delay • Automatically hidden from clients • Protects against operator errors –Accidentally delete database –Application corrupts data
  • 78. Arbiters • Vote in elections • Don’t store a copy of data • Use as tie breaker
  • 80. Typical Data Center Primary Secondary Secondary
  • 81. Backup Node Data Center Primary Secondary Secondary hidden = true backups
  • 82. Disaster Recovery Active Data Center Standby Data Center Primary Secondary Secondary priority = 1 priority = 1
  • 83. Multi Data Center West Coast DC Central DC East Coast DC Secondary Primary Secondary priority = 1
  • 85. Quick Release History • 1.0 - August 2009: bson, BTree range query optimization • 1.2 - December 2009: map-reduce • 1.4 - March 2010: background indexing, geo indexes • 1.6 - August 2010: sharding, replica sets • 1.8 - March 2011: journaling, sparse and covered indexes • 1.10 = 2.0 - Sep 2011: cumulative performance improvements
  • 86. Roadmap • v2.2 Projected 2012 Q1 • Concurrency: yielding and db or collection-level locking • Improved free list implementation • New aggregation framework • TTL Collections • Hash shard key
  • 87. Roadmap: short List • Full text Search • Online compaction • Internal compression • Read tagging • XML / JSON transcoder Vote: http://jira.mongodb.org/

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n
  75. \n
  76. \n
  77. \n
  78. \n
  79. \n
  80. \n
  81. \n
  82. \n
  83. \n
  84. \n
  85. \n
  86. \n