SlideShare a Scribd company logo
1 of 20
Download to read offline
Scaling The Facebook Realtime
Endpoint Using MongoDB
PRESENTED BY:
           Justin Medoy and Mike Sherov
                  SNAP Interactive

           jmedoy@snap-interactive.com
            mikesherov@snap-interactive
Redefining the Way People
Meet & Socialize Online
What are Facebook Realtime Updates?


Facebook says: "Real-time updates enable your
 application to subscribe to changes in data in
 Facebook."


What it means: "You provide a URL,Facebook pings it
 when users do stuff."
Pings from Facebook
● Every minute we get around 20 pings from
  facebook that contain data for around 11,000 users
{
    "object": "user",
    "entry":
      [
        {
             "uid": 1335845740,
             "changed_fields":
             [
                 "name",
                 "picture"
             ],
           "time": 232323
        },....
    ]
    }
WHAT?!? Where's the data?
● Facebook tells you that something about the field
  changed, but not what the current data is.
Retrieving User Data from the
Graph
●   Solution: go back to Facebook and grab
    the user's data
    https://graph.facebook.com?
    ids=<USERID>&fields=music,movies,likes
    *This will only get data that the user has made publicly available
●   To avoid timeouts each call to Facebook only asks
    for the data for 25 users
    *Our CURL timeouts for Facebook have been lowered from the
    default 60 seconds to 25 seconds
Update the user's profile
●   Facebook won't tell you exactly what's
    changed but we can figure it out from our
    own data

    All Data - Stored Data = Changed Data

●   The next step is to update the user's
    profile with this changed data
Mongo Architecture
●   Mongo 2.0.2
●   Mongo PHP driver 1.2.10
●   Two separate replica sets
    ○   User data
    ○   Interest data
●   Why separate replica sets?
    ○   Keep as much of the index as possible in
        memory
    ○   Disk reads are expensive
User Data Replica Set
Design Challenge
● Random access pattern over 106 million
  documents
User Data Replica Set
●   Large $in queries
●   High page faults in
    MMS
●   We upgraded from
    32G to 128G on
    each node
Indexes

●   We added duplicates of some of our
    indexes with reversed fields
●   Updating all of these extra indexes was a
    huge bottleneck
Indexes

●   Unique index uid_1
●   profile.sync_1_installed_1_platforms.facebook_1
●   email_1
●   uid_1_installed_1
●   last_login_1_uid_1
Indexes

●   There were certain minutes when Facebook would
    tell us that the data had changed for more than
    40,000 users
     ○   limit the amount of data Facebook can send in one minute
●   High number of writes and a large number of
    indexes prevented the secondaries from reading
    the oplog because of the global write lock
    ○    Increase the size of the oplog
    ○    This is fixed in 2.2.1
Indexes and the realtime endpoint

profile.sync_1_installed_1_platforms.facebook_1
●   Filtered 11,000 users a minute down to a few hundred
     ○  moved filtering logic out of PHP into the index
●   Added efficiency from covered index
     ○  All we need is platforms.facebook, which is part of the
        index
Interest Replica Set

Different set of challenges than User repl set
●  Needs to power typeahead
●  64 million interests
●  Access pattern based on interest popularity
    ○   Lady Gaga is going to get accessed more than Ladybug, Javascript
The Typeahead
{
    "_id" : ObjectId("4f511a230624967b7d000003"),
    "name" : "Rubiks Cube",
    "search" : "rubiks cube",
    "subsearch" : [
        "r",
        "ru",
        "rub",
        "rubi",
        "rubik",
        "rubiks",
        "rubiks ",
        "rubiks c",
        "rubiks cu",
        "rubiks cub"
    ],
    "popularity" : NumberLong(907)
}
The Typeahead

 ●   Add an array with the first few characters of
     interest
 ●   Add an index on that field
 ●   This allows us to have 10 entries in 1 index
     instead of 10 separate indexes

http://docs.mongodb.org/manual/core/indexes/#index-type-multikey
Typeahead indexes

subsearch_1_popularity_-1
● Specifying -1 for the popularity component of
  the index naturally causes the typeahead to
  show more popular interests first
Lessons Learned

●   Don't over index
●   Covered indexes when possible
●   indexes to reduce size of returned data
●   Keep everything in memory
●   Multikey index for typeaheads
●   Utilize -1 in index for natural sorting
SNAP Interactive, Inc.
                           Contact Information

●   SNAP Interactive, Inc.
    SNAP-Interactive.com

●   Justin Medoy
    Team Lead / Software Engineer
    JMedoy@snap-interactive.com


●   Mike Sherov
    Lead Developer
    mike@snap-interactive.com
    @mikesherov
●   For more information on our open positions, email
    jobs@snap-interactive.com or check our website at   meet people like you
    www.snap-interactive.com/jobs/job-openings

More Related Content

More from MongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB
 
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...MongoDB
 
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB ChartsMongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB ChartsMongoDB
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB
 
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...MongoDB
 

More from MongoDB (20)

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
MongoDB .local Paris 2020: Tout savoir sur le moteur de recherche Full Text S...
 
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
MongoDB .local Paris 2020: Adéo @MongoDB : MongoDB Atlas & Leroy Merlin : et ...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...
 
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB ChartsMongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB Charts
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
 
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
MongoDB .local Toronto 2019: Keep your Business Safe and Scaling Holistically...
 

Scaling Facebook's Realtime Endpoint with MongoDB, Snap Interactive

  • 1. Scaling The Facebook Realtime Endpoint Using MongoDB PRESENTED BY: Justin Medoy and Mike Sherov SNAP Interactive jmedoy@snap-interactive.com mikesherov@snap-interactive
  • 2. Redefining the Way People Meet & Socialize Online
  • 3. What are Facebook Realtime Updates? Facebook says: "Real-time updates enable your application to subscribe to changes in data in Facebook." What it means: "You provide a URL,Facebook pings it when users do stuff."
  • 4. Pings from Facebook ● Every minute we get around 20 pings from facebook that contain data for around 11,000 users { "object": "user", "entry": [ { "uid": 1335845740, "changed_fields": [ "name", "picture" ], "time": 232323 },.... ] }
  • 5. WHAT?!? Where's the data? ● Facebook tells you that something about the field changed, but not what the current data is.
  • 6. Retrieving User Data from the Graph ● Solution: go back to Facebook and grab the user's data https://graph.facebook.com? ids=<USERID>&fields=music,movies,likes *This will only get data that the user has made publicly available ● To avoid timeouts each call to Facebook only asks for the data for 25 users *Our CURL timeouts for Facebook have been lowered from the default 60 seconds to 25 seconds
  • 7. Update the user's profile ● Facebook won't tell you exactly what's changed but we can figure it out from our own data All Data - Stored Data = Changed Data ● The next step is to update the user's profile with this changed data
  • 8. Mongo Architecture ● Mongo 2.0.2 ● Mongo PHP driver 1.2.10 ● Two separate replica sets ○ User data ○ Interest data ● Why separate replica sets? ○ Keep as much of the index as possible in memory ○ Disk reads are expensive
  • 9. User Data Replica Set Design Challenge ● Random access pattern over 106 million documents
  • 10. User Data Replica Set ● Large $in queries ● High page faults in MMS ● We upgraded from 32G to 128G on each node
  • 11. Indexes ● We added duplicates of some of our indexes with reversed fields ● Updating all of these extra indexes was a huge bottleneck
  • 12. Indexes ● Unique index uid_1 ● profile.sync_1_installed_1_platforms.facebook_1 ● email_1 ● uid_1_installed_1 ● last_login_1_uid_1
  • 13. Indexes ● There were certain minutes when Facebook would tell us that the data had changed for more than 40,000 users ○ limit the amount of data Facebook can send in one minute ● High number of writes and a large number of indexes prevented the secondaries from reading the oplog because of the global write lock ○ Increase the size of the oplog ○ This is fixed in 2.2.1
  • 14. Indexes and the realtime endpoint profile.sync_1_installed_1_platforms.facebook_1 ● Filtered 11,000 users a minute down to a few hundred ○ moved filtering logic out of PHP into the index ● Added efficiency from covered index ○ All we need is platforms.facebook, which is part of the index
  • 15. Interest Replica Set Different set of challenges than User repl set ● Needs to power typeahead ● 64 million interests ● Access pattern based on interest popularity ○ Lady Gaga is going to get accessed more than Ladybug, Javascript
  • 16. The Typeahead { "_id" : ObjectId("4f511a230624967b7d000003"), "name" : "Rubiks Cube", "search" : "rubiks cube", "subsearch" : [ "r", "ru", "rub", "rubi", "rubik", "rubiks", "rubiks ", "rubiks c", "rubiks cu", "rubiks cub" ], "popularity" : NumberLong(907) }
  • 17. The Typeahead ● Add an array with the first few characters of interest ● Add an index on that field ● This allows us to have 10 entries in 1 index instead of 10 separate indexes http://docs.mongodb.org/manual/core/indexes/#index-type-multikey
  • 18. Typeahead indexes subsearch_1_popularity_-1 ● Specifying -1 for the popularity component of the index naturally causes the typeahead to show more popular interests first
  • 19. Lessons Learned ● Don't over index ● Covered indexes when possible ● indexes to reduce size of returned data ● Keep everything in memory ● Multikey index for typeaheads ● Utilize -1 in index for natural sorting
  • 20. SNAP Interactive, Inc. Contact Information ● SNAP Interactive, Inc. SNAP-Interactive.com ● Justin Medoy Team Lead / Software Engineer JMedoy@snap-interactive.com ● Mike Sherov Lead Developer mike@snap-interactive.com @mikesherov ● For more information on our open positions, email jobs@snap-interactive.com or check our website at meet people like you www.snap-interactive.com/jobs/job-openings