SlideShare a Scribd company logo
1 of 35
Download to read offline
Database Sharding
#BarCampGhent2
Tags
scaling, performance,
database, php, mySQL,
memcached, sphinx
echo “Hello, world!”;




 Jayme Rotsaert

    core developer @ Netlog

    since 2 years

 Jurriaan Persyn

    lead web developer @ Netlog

    since 3 years
What is sharding?




        A technique to scale databases
Some requirements @ Netlog ...




•   serve 36+ million unique users

•   4+ billion pageviews a month

•   huge amounts of data (eg. 100+ million
    friendships on nl.netlog.com)

•   write-heavy app (1.4/1 read-write ratio)

•   typical db up to 3000+ queries/sec (15h-22h)
Master
 r/w
Master
          w




Slave            Slave
  r                r
Top
               Master
                 w

Messages                    Friends
  r/w                         r/w
            Top      Top
           Slave    Slave
             r        r
Top
                     Master
                       w

   Messages                           Friends
      w                                  w
                  Top      Top
                 Slave    Slave
                   r        r
Msgs     Msgs                     Frnds    Frnds
Slave    Slave                    Slave    Slave
  r        r                        r        r
1040:
               Too many      Top
              connections   Master
                              w

   Messages                                  Friends
      w                                         w
                     Top          Top
                    Slave        Slave
                      r            r
Msgs     Msgs                            Frnds    Frnds
Slave    Slave                           Slave    Slave
  r        r                               r        r
1040:
                  Too many      Top
                 connections   Master      1040:
                                 w       Too many
    1040:                               connections
  Too many
    Messages
 connections                                          Friends
        w                                                w
                        Top          Top
                       Slave        Slave
                         r            r
Msgs        Msgs1040:                        Frnds         Frnds
              Too many
Slave       Slave
             connections                     Slave         Slave
  r           r                                r             r
?
Vertical
partitioning?
Master-to-master
  replication?
Caching?
Sharding!
Friends
                     %10 = 2
          Friends                 Friends
          %10 = 1                 %10 = 3
Friends                                     Friends
%10 = 0                                     %10 = 4
                    Aggregation
Friends                                     Friends
%10 = 9                                     %10 = 5
          Friends                 Friends
          %10 = 8                 %10 = 6
                     Friends
                     %10 = 7
More data?
More shards!
Existing solutions?




 •   MySQL NDB storage engine
     (sharded, not dynamic)

 •   memcached from Mysql
     (SQL-functions or storage engine)

 •   Oracle RAC

 •   HiveDB
     (mySQL sharding framework in Java)
Our solution




 • in-house
 • in php
 • middleware between application logic and
     class DB


 • typically carve shards by   $userID
sharddbhost001

    sharddb001            sharddb002

shard0001 shard0002   shard0005 shard0006



shard0003 shard0004   shard0007 shard0008
Overview of our Sharding implementation




 • Sharding Management
  • “DNS” System (the modulo function)
  • Balancer / Manager
 • Sharded Tables API
  • Database Access Layer
  • Caching Layer
Sharding Management “DNS”




 • “DNS” system translates      $userID   to the right db
    connection details

   •   $userID to $shardID DNS
       (via SQL/memcache - combination not
       fixed!)

   •   $shardIDto $hostname & $databasename
       (generated configuration files)
Sharded Tables API




 • Example API:
  • An object per         /      -combination
                     $tableName $userID


    • implementation of a class providing basic
         CRUD functions

      • typically a class for accessing database
         records with “a user’s items”
Some implications ...




  • No cross-shard (i.e. cross-user) SQL queries
   •            between sharded tables
        (LEFT) JOIN

       becomes impossibly complicated

    • It’s possible to design (parts of) application
       so there’s no need for cross-shard queries

  • Denormalize if you need
     $userID
                              SELECT   on other than


  • Data consistency
Some implications ... (2)




  •   Your DBA loves you again

      •   Smaller, thus faster tables

      •   Simpler, thus faster queries

  •   More atomic operations > better caching

  •   More PHP processing

      •   Needs memory

      •   PHP-webservers scale more easily
Some implications ... (3)




  •   $itemID   will only be unique in relation to $userID

  •   Downtime of a single databasehost affects
      only users on that DB
Sharding Management: Balancing Shards




 • Define ‘load’ percentage for shards (#users),
    databases (#users, #filesize), hosts (#sql
    reads, #sql writes, #cpu load, #users, ...)

 • Balance loads and start move operations
  • Done completely in PHP / transparant / no
       user downtime
What is memcached?




General-purpose distributed memory caching
Using memcached


function isSober($user)
{
	 $memcache = new Memcache();
	 $cacheKey = 'issober_' . $user->getUserID();
	 $result = $memcache->get($cacheKey); // fetch
	 if ($result === false)
	 {
	 	 // do some database heavy stuff
	 	 $result = (($user->getJobIndustry() == Industry::DEFENSE) &&
$location->isIn(City::get('NYC'))) ? quot;hammeredquot; : quot;soberquot;; //
whatever!
	 	
	 	 $memcache->set($cacheKey, $result, 0); // unlimited ttl
	 }
	 return $result;
}

var_dump(isSober(new User(quot;p.decremquot;))); // --> string(8) quot;hammeredquot;
memcached usage for Sharding




 • Typical usage:
  • Each sharded record is cached
         (key: table/userID/itemID)

     • Caches with lists, and caches with counts
         (key: where/order/...-clauses)

 •    Several caching modes:

     •   READ_INSERT_MODE


     •   READ_UPDATE_INSERT_MODE
CacheRevisionNumbers



 • What? Cached version number to use in other
     cache-keys

 • Why? Caching of counts / lists
 • Example: cache key for list of users latest
     photos (simplified):   ”USER_PHOTOS” . $userID .
     $cacheRevisionNumber . ”ORDERBYDATEADDDESCLIMIT10”;


 •   $cacheRevisionNumberis number, bumped on
     every CUD-action, clears caches of all counts
     +lists, else unlimited ttl.

 • “number” is current/cached timestamp
Sphinx Search



 • Problem:
    How do you give an overview of eg. latest
    photos from different users? (on different
    shards)

 • Solution:
    Check Jayme’s presentation “Sphinx search
    optimization”, distributed full text search.
    (Use it for more than searching!)
netlog.com/go/developer
jayme@netlog.com - jurriaan@netlog.com

More Related Content

Similar to Database Sharding at Netlog

Ruby on Rails in UbiSunrise
Ruby on Rails in UbiSunriseRuby on Rails in UbiSunrise
Ruby on Rails in UbiSunriseWisely chen
 
WebCamp: Developer Day: The Big, the Small and the Redis - Андрей Савченко
WebCamp: Developer Day: The Big, the Small and the Redis - Андрей СавченкоWebCamp: Developer Day: The Big, the Small and the Redis - Андрей Савченко
WebCamp: Developer Day: The Big, the Small and the Redis - Андрей СавченкоGeeksLab Odessa
 
Wichert Akkerman - Plone.Org Infrastructure
Wichert Akkerman - Plone.Org InfrastructureWichert Akkerman - Plone.Org Infrastructure
Wichert Akkerman - Plone.Org InfrastructureVincenzo Barone
 
Wichert Akkerman Plone Deployment Practices The Plone.Org Setup
Wichert Akkerman   Plone Deployment Practices   The Plone.Org SetupWichert Akkerman   Plone Deployment Practices   The Plone.Org Setup
Wichert Akkerman Plone Deployment Practices The Plone.Org SetupVincenzo Barone
 
Rails and Legacy Databases - RailsConf 2009
Rails and Legacy Databases - RailsConf 2009Rails and Legacy Databases - RailsConf 2009
Rails and Legacy Databases - RailsConf 2009Brian Hogan
 
Compass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS DeveloperCompass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS DeveloperWynn Netherland
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackTypenathanmarz
 
When To Use Ruby On Rails
When To Use Ruby On RailsWhen To Use Ruby On Rails
When To Use Ruby On Railsdosire
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webDan Delany
 
A Re-Introduction to JavaScript
A Re-Introduction to JavaScriptA Re-Introduction to JavaScript
A Re-Introduction to JavaScriptSimon Willison
 
Dynomite at Erlang Factory
Dynomite at Erlang FactoryDynomite at Erlang Factory
Dynomite at Erlang Factorymoonpolysoft
 
Perl University: Getting Started with Perl
Perl University: Getting Started with PerlPerl University: Getting Started with Perl
Perl University: Getting Started with Perlbrian d foy
 
Hiveminder - Everything but the Secret Sauce
Hiveminder - Everything but the Secret SauceHiveminder - Everything but the Secret Sauce
Hiveminder - Everything but the Secret SauceJesse Vincent
 
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)Jeff Larkin
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
Using Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applicationsUsing Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applicationsTejas Patil
 
Juggling Chainsaws: Perl and MongoDB
Juggling Chainsaws: Perl and MongoDBJuggling Chainsaws: Perl and MongoDB
Juggling Chainsaws: Perl and MongoDBDavid Golden
 

Similar to Database Sharding at Netlog (20)

Ruby on Rails in UbiSunrise
Ruby on Rails in UbiSunriseRuby on Rails in UbiSunrise
Ruby on Rails in UbiSunrise
 
WebCamp: Developer Day: The Big, the Small and the Redis - Андрей Савченко
WebCamp: Developer Day: The Big, the Small and the Redis - Андрей СавченкоWebCamp: Developer Day: The Big, the Small and the Redis - Андрей Савченко
WebCamp: Developer Day: The Big, the Small and the Redis - Андрей Савченко
 
Wichert Akkerman - Plone.Org Infrastructure
Wichert Akkerman - Plone.Org InfrastructureWichert Akkerman - Plone.Org Infrastructure
Wichert Akkerman - Plone.Org Infrastructure
 
Wichert Akkerman Plone Deployment Practices The Plone.Org Setup
Wichert Akkerman   Plone Deployment Practices   The Plone.Org SetupWichert Akkerman   Plone Deployment Practices   The Plone.Org Setup
Wichert Akkerman Plone Deployment Practices The Plone.Org Setup
 
Rails and Legacy Databases - RailsConf 2009
Rails and Legacy Databases - RailsConf 2009Rails and Legacy Databases - RailsConf 2009
Rails and Legacy Databases - RailsConf 2009
 
Compass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS DeveloperCompass, Sass, and the Enlightened CSS Developer
Compass, Sass, and the Enlightened CSS Developer
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
 
When To Use Ruby On Rails
When To Use Ruby On RailsWhen To Use Ruby On Rails
When To Use Ruby On Rails
 
API's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic webAPI's, Freebase, and the Collaborative Semantic web
API's, Freebase, and the Collaborative Semantic web
 
A Re-Introduction to JavaScript
A Re-Introduction to JavaScriptA Re-Introduction to JavaScript
A Re-Introduction to JavaScript
 
Dynomite at Erlang Factory
Dynomite at Erlang FactoryDynomite at Erlang Factory
Dynomite at Erlang Factory
 
Perl University: Getting Started with Perl
Perl University: Getting Started with PerlPerl University: Getting Started with Perl
Perl University: Getting Started with Perl
 
Vidoop CouchDB Talk
Vidoop CouchDB TalkVidoop CouchDB Talk
Vidoop CouchDB Talk
 
Hiveminder - Everything but the Secret Sauce
Hiveminder - Everything but the Secret SauceHiveminder - Everything but the Secret Sauce
Hiveminder - Everything but the Secret Sauce
 
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
Os Bowkett
Os BowkettOs Bowkett
Os Bowkett
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Using Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applicationsUsing Spark's RDD APIs for complex, custom applications
Using Spark's RDD APIs for complex, custom applications
 
Juggling Chainsaws: Perl and MongoDB
Juggling Chainsaws: Perl and MongoDBJuggling Chainsaws: Perl and MongoDB
Juggling Chainsaws: Perl and MongoDB
 

More from Jurriaan Persyn

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the CloudJurriaan Persyn
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogJurriaan Persyn
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Jurriaan Persyn
 

More from Jurriaan Persyn (6)

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Engagor Walkthrough
Engagor WalkthroughEngagor Walkthrough
Engagor Walkthrough
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the Cloud
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: Netlog
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Database Sharding at Netlog

  • 1.
  • 3. Tags scaling, performance, database, php, mySQL, memcached, sphinx
  • 4. echo “Hello, world!”; Jayme Rotsaert core developer @ Netlog since 2 years Jurriaan Persyn lead web developer @ Netlog since 3 years
  • 5. What is sharding? A technique to scale databases
  • 6. Some requirements @ Netlog ... • serve 36+ million unique users • 4+ billion pageviews a month • huge amounts of data (eg. 100+ million friendships on nl.netlog.com) • write-heavy app (1.4/1 read-write ratio) • typical db up to 3000+ queries/sec (15h-22h)
  • 8. Master w Slave Slave r r
  • 9. Top Master w Messages Friends r/w r/w Top Top Slave Slave r r
  • 10. Top Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  • 11. 1040: Too many Top connections Master w Messages Friends w w Top Top Slave Slave r r Msgs Msgs Frnds Frnds Slave Slave Slave Slave r r r r
  • 12. 1040: Too many Top connections Master 1040: w Too many 1040: connections Too many Messages connections Friends w w Top Top Slave Slave r r Msgs Msgs1040: Frnds Frnds Too many Slave Slave connections Slave Slave r r r r
  • 13. ?
  • 18. Friends %10 = 2 Friends Friends %10 = 1 %10 = 3 Friends Friends %10 = 0 %10 = 4 Aggregation Friends Friends %10 = 9 %10 = 5 Friends Friends %10 = 8 %10 = 6 Friends %10 = 7
  • 20. Existing solutions? • MySQL NDB storage engine (sharded, not dynamic) • memcached from Mysql (SQL-functions or storage engine) • Oracle RAC • HiveDB (mySQL sharding framework in Java)
  • 21. Our solution • in-house • in php • middleware between application logic and class DB • typically carve shards by $userID
  • 22. sharddbhost001 sharddb001 sharddb002 shard0001 shard0002 shard0005 shard0006 shard0003 shard0004 shard0007 shard0008
  • 23. Overview of our Sharding implementation • Sharding Management • “DNS” System (the modulo function) • Balancer / Manager • Sharded Tables API • Database Access Layer • Caching Layer
  • 24. Sharding Management “DNS” • “DNS” system translates $userID to the right db connection details • $userID to $shardID DNS (via SQL/memcache - combination not fixed!) • $shardIDto $hostname & $databasename (generated configuration files)
  • 25. Sharded Tables API • Example API: • An object per / -combination $tableName $userID • implementation of a class providing basic CRUD functions • typically a class for accessing database records with “a user’s items”
  • 26. Some implications ... • No cross-shard (i.e. cross-user) SQL queries • between sharded tables (LEFT) JOIN becomes impossibly complicated • It’s possible to design (parts of) application so there’s no need for cross-shard queries • Denormalize if you need $userID SELECT on other than • Data consistency
  • 27. Some implications ... (2) • Your DBA loves you again • Smaller, thus faster tables • Simpler, thus faster queries • More atomic operations > better caching • More PHP processing • Needs memory • PHP-webservers scale more easily
  • 28. Some implications ... (3) • $itemID will only be unique in relation to $userID • Downtime of a single databasehost affects only users on that DB
  • 29. Sharding Management: Balancing Shards • Define ‘load’ percentage for shards (#users), databases (#users, #filesize), hosts (#sql reads, #sql writes, #cpu load, #users, ...) • Balance loads and start move operations • Done completely in PHP / transparant / no user downtime
  • 30. What is memcached? General-purpose distributed memory caching
  • 31. Using memcached function isSober($user) { $memcache = new Memcache(); $cacheKey = 'issober_' . $user->getUserID(); $result = $memcache->get($cacheKey); // fetch if ($result === false) { // do some database heavy stuff $result = (($user->getJobIndustry() == Industry::DEFENSE) && $location->isIn(City::get('NYC'))) ? quot;hammeredquot; : quot;soberquot;; // whatever! $memcache->set($cacheKey, $result, 0); // unlimited ttl } return $result; } var_dump(isSober(new User(quot;p.decremquot;))); // --> string(8) quot;hammeredquot;
  • 32. memcached usage for Sharding • Typical usage: • Each sharded record is cached (key: table/userID/itemID) • Caches with lists, and caches with counts (key: where/order/...-clauses) • Several caching modes: • READ_INSERT_MODE • READ_UPDATE_INSERT_MODE
  • 33. CacheRevisionNumbers • What? Cached version number to use in other cache-keys • Why? Caching of counts / lists • Example: cache key for list of users latest photos (simplified): ”USER_PHOTOS” . $userID . $cacheRevisionNumber . ”ORDERBYDATEADDDESCLIMIT10”; • $cacheRevisionNumberis number, bumped on every CUD-action, clears caches of all counts +lists, else unlimited ttl. • “number” is current/cached timestamp
  • 34. Sphinx Search • Problem: How do you give an overview of eg. latest photos from different users? (on different shards) • Solution: Check Jayme’s presentation “Sphinx search optimization”, distributed full text search. (Use it for more than searching!)