SlideShare a Scribd company logo
1 of 33
Scaling the World’s
Largest Photo
Blogging
Community
Farhan “Frank” Mashraqi
Senior MySQL DBA
Fotolog, Inc.
fmashraqi@fotolog.com
Credits:
Warren L. Habib: CTO
Olu King: Senior Systems Administrator
Introduction
 Farhan Mashraqi
- Senior MySQL DBA Fotolog, Inc.
- Known on PlanetMySQL as Frank Mash
- Author of upcoming “Pro Ruby on Rails”
by Apress
 Contact
- fmashraqi@fotolog.com
- softwareengineer99@yahoo.com
- Blog:
- http://mysqldatabaseadministration.blogspot.com
- http://mashraqi.com
What is Fotolog?
 Social networking
- Guestbook comments
- Friend/ Favorite lists
- Members create “Social Capital”
 “One photo a day”
 Currently 25th
most visited website on the Internet (Alexa)
 History
 http://blog.fotolog.com/
Fotolog (Screenshot of home page)
Fotolog (Screenshot of a fotolog member page)
Fotolog Growth
 228 million member photos
 2.47 billion guestbook comments
 20% of members visit the site daily
 24 minutes a day spent by an
average user
 10 guestbook comments per photo
 1,000 people or more see a photo
on average
 7 million members and counting
 “explosive growth in Europe”
 Italy and Spain among the fastest-
growing countries
 Recently broke the 500K photos
uploaded a day record
 90 million page views
Fotolog
Flickr
Technology
 Sun
 Solaris 10
 MySQL
 Apache
 Java / Hibernate
 PHP
 Memcached
 3Par
 IBRIX
 StrongMail
MySQL at Fotolog
 32 Servers
Specification of servers
 Four “clusters”
- User
- GB
- PH
- FF
 Non-persistent connections
(PHP)
- Connection Pooling (Java)
 Mostly MyISAM initially
Later mostly converted to
InnoDB
 Application side table
partitioning
 Memcache
Image Storage / Delivery
 MySQL is used to store image metadata only
- 3Par (utility storage)
- Thin Provisioning
- (dedicate on allocation vs. dedicate on write)
 How fast growing each day?
 Frequently Accessed vs. Infrequently accessed media
 Third party CDN: Akamai/Panther
Important Scalability Considerations
Do you really need to have 5 nines availability?
Budget
Time to deploy
Testing
Can we afford:
SPF?
Not having read redundancy?
User
PH
GB
FF
Not having write redundancy?
User
PH
GB
FF
Partitioning
SHARD 1
SHARD 2
SHARD 3
Table_v1
Table_v2
Table_v3
Table_v4
Partitioning thoughts
Load distribution across shards
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
M A B D Z K T 0 1 2 3 7 K O Q R T V F P 8 9 G S 5 6 E H U X Y L _ A
Load distribution across shards
Ideal distribution
proposed shard for load distribution
0%
2%
4%
6%
8%
10%
12%
db4 db18 db19 db22 db23 db24 db25 db28 db30 db32
proposed shard for load distribution
GB current db4
db18
db22
db23
db24
db25
db26
db27
db28
db30
db32
Application Servers
4 18 22 23 24 25 26 27 28 30 32
read
write
Single Point of Failure
GB Scalability db4
db18
db22
db23
db24
db25
db26
db27
db28
db30
db32
Application Servers
4 18 22 23 24 25 26 27 28 30 32
read
write
00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99
SlaveMaster/DRBD
Current Scheme for fl_db1 repl. PH
Application Servers
read
write
Slave
DB2DB1 DB3
DB8 DB12
Application Servers Issuing PH Queries
RTX
Repl.
Repl.Repl.
DB7 DB9 DB15
FSW 05DHN AEK 16JOQUZ 28IP _ 39B 4C 7GLVY M
DB10 DB11 DB13 DB14 DB16 29
FF. Repl.
Proposed Scheme for PH
(Write & Read)
Application Servers
7 8 9 10 11 12 13 14 15 16 29
read
write
00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99
TO USER CLUSTER
AUTO-INC table lock contention
SEL
SEL
SEL
SEL
SEL
SEL
SEL
SEL
SEL
SEL
M
Y
S
Q
L
Thread concurrency
SELECTs do very well with
Increased concurrency.
QPS: 500+
SELECT
INSERT
GOOD TIMES
AUTO-INC table lock contention
SEL
SEL
SEL
SEL
SEL
INS
INS
M
Y
S
Q
L
Thread concurrency
As more SELECTs come,
AUTO-INC lock contention
Starts causing problem.
WARNING
SEL
SEL
SEL
SELECT
INSERT
AUTO-INC table lock contention
INS
SEL
INS
SEL
INS
INS
INS
INS
INS
INS
M
Y
S
Q
L
Thread concurrency
PROBLEM
SELECT
INSERT
SEL
SEL
SEL
SEL
INS
INS
INS
INS
INS
InnoDB Tablespace Structure (Simplified)
PK / CLUSTERED INDEX
SECONDARY INDEX
PK (clustered index key)
6 byte header
Links together consecutive records
& used in row-level locking
Clustered index
contains
Fields for all
user-defined
columns
6 byte trx id
7 byte roll pointer
6 byte row id
If no PK or UNIQUE
NOT NULL defined
Record Directory
Array of
Pointers to each field of the record
1 byte: If the total length of fields in
record is 128 bytes
2 bytes: otherwise
Data part of record
InnoDB Index Structure (Simplified)
DATA PAGE
PK INDEX / CLUSTERED INDEX
SECONDARY INDEX
PK
ROW DATA
PK
Old Schema
 CREATE TABLE `guestbook_v3` (
`identifier` bigint(20) unsigned NOT NULL auto_increment,
`user_name` varchar(16) NOT NULL default '',
`photo_identifier` bigint(20) unsigned NOT NULL default '0',
`posted` datetime NOT NULL default '0000-00-00
00:00:00',
…
PRIMARY KEY (`identifier`),
KEY `guestbook_photo_id_posted_idx`
(`photo_identifier`,`posted`)
) ENGINE=MyISAM
Reads
Data pages
• Data ordered by
Identifier (PK)
• Looked up by
secondary key
New Schema
 CREATE TABLE `guestbook_v4` (
`identifier` int(9) unsigned NOT NULL auto_increment,
`user_name` varchar(16) NOT NULL default '',
`photo_identifier` int(9) unsigned NOT NULL default '0',
`posted` timestamp NOT NULL default '0000-00-00
00:00:00',
…
PRIMARY KEY (`photo_identifier`,`posted`,`identifier`),
KEY `identifier` (`identifier`)
) ENGINE=InnoDB 1 row in set (7.64 sec)
Pending preads (Optimizing Disk Usage)
Data pages
• Data ordered by
composite key
consisting of
photo_identifier
(FK)
• Looked up by
primary key
• Very low read
requests per
second
Pending reads / writes / Proposed
Throughput not as important as number of requests
Pending reads / writes / Proposed
Pending reads
MySQL Performance Challenges
 Finding the source of problem
 Mostly disk bound in mature systems
 Is the query cache hurting you?
 RAM addition helps dodge the bullet
 Disk striping
 Restructuring tables for optimal performance
 LD_PRELOAD_64 = /usr/lib/sparcv9/libumem.so
Considerations for future growth
 SQLite?
 File system?
 PostgreSQL?
 Make application better and optimize tables?
Things to remember
 Know the problem
 Know your application
 Know your storage engine
 Know your requirements
 Know your budget
Questions?

More Related Content

What's hot

Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationMongoDB
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
 
Back to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to shardingBack to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to shardingMongoDB
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDBMongoDB
 
Back to Basics Spanish Webinar 3 - Introducción a los replica sets
Back to Basics Spanish Webinar 3 - Introducción a los replica setsBack to Basics Spanish Webinar 3 - Introducción a los replica sets
Back to Basics Spanish Webinar 3 - Introducción a los replica setsMongoDB
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB
 
Introduction to google hacking database
Introduction to google hacking databaseIntroduction to google hacking database
Introduction to google hacking databaseimthebeginner
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data modelDuyhai Doan
 
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2MongoDB
 

What's hot (14)

Back to Basics: My First MongoDB Application
Back to Basics: My First MongoDB ApplicationBack to Basics: My First MongoDB Application
Back to Basics: My First MongoDB Application
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
Back to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to shardingBack to Basics Spanish 4 Introduction to sharding
Back to Basics Spanish 4 Introduction to sharding
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB
 
Back to Basics Spanish Webinar 3 - Introducción a los replica sets
Back to Basics Spanish Webinar 3 - Introducción a los replica setsBack to Basics Spanish Webinar 3 - Introducción a los replica sets
Back to Basics Spanish Webinar 3 - Introducción a los replica sets
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDBMongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
MongoDB Days Silicon Valley: Winning the Dreamforce Hackathon with MongoDB
 
Google Hack
Google HackGoogle Hack
Google Hack
 
Introduction to google hacking database
Introduction to google hacking databaseIntroduction to google hacking database
Introduction to google hacking database
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
3 google hacking
3 google hacking3 google hacking
3 google hacking
 
Introduction to Cassandra & Data model
Introduction to Cassandra & Data modelIntroduction to Cassandra & Data model
Introduction to Cassandra & Data model
 
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2MongoDB Days Silicon Valley: Introducing MongoDB 3.2
MongoDB Days Silicon Valley: Introducing MongoDB 3.2
 

Viewers also liked

INCEARCA EUROPA
INCEARCA  EUROPAINCEARCA  EUROPA
INCEARCA EUROPAR G
 
Anladim Can Yucel
Anladim Can YucelAnladim Can Yucel
Anladim Can Yucelerkanea
 
Tatli Hayvanlar
Tatli HayvanlarTatli Hayvanlar
Tatli Hayvanlarerkanea
 
系统性能分析和优化.ppt
系统性能分析和优化.ppt系统性能分析和优化.ppt
系统性能分析和优化.pptFrank Cai
 
Coca cola in china
Coca cola in chinaCoca cola in china
Coca cola in chinasetakhil22
 

Viewers also liked (7)

WAP2.0
WAP2.0WAP2.0
WAP2.0
 
INCEARCA EUROPA
INCEARCA  EUROPAINCEARCA  EUROPA
INCEARCA EUROPA
 
Anladim Can Yucel
Anladim Can YucelAnladim Can Yucel
Anladim Can Yucel
 
Navigation Bars
Navigation BarsNavigation Bars
Navigation Bars
 
Tatli Hayvanlar
Tatli HayvanlarTatli Hayvanlar
Tatli Hayvanlar
 
系统性能分析和优化.ppt
系统性能分析和优化.ppt系统性能分析和优化.ppt
系统性能分析和优化.ppt
 
Coca cola in china
Coca cola in chinaCoca cola in china
Coca cola in china
 

Similar to Fotolog.Com.Mashraqi Scaling

扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区yiditushe
 
Fotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging CommunityFotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging Communityfarhan "Frank"​ mashraqi
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
EEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsExpertos en TI
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleSean Chittenden
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsMike Broberg
 
YoctoDB в Яндекс.Вертикалях
YoctoDB в Яндекс.ВертикаляхYoctoDB в Яндекс.Вертикалях
YoctoDB в Яндекс.ВертикаляхCEE-SEC(R)
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationMongoDB
 
DEF CON 27 - XILING GONG PETER PI - exploiting qualcom wlan and modem over th...
DEF CON 27 - XILING GONG PETER PI - exploiting qualcom wlan and modem over th...DEF CON 27 - XILING GONG PETER PI - exploiting qualcom wlan and modem over th...
DEF CON 27 - XILING GONG PETER PI - exploiting qualcom wlan and modem over th...Felipe Prado
 
web2py:Web development like a boss
web2py:Web development like a bossweb2py:Web development like a boss
web2py:Web development like a bossFrancisco Ribeiro
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshynvtors
 
StackOverflow Architectural Overview
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural OverviewFolio3 Software
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...Insight Technology, Inc.
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introductionantoinegirbal
 
Database Wizardry for Legacy Applications
Database Wizardry for Legacy ApplicationsDatabase Wizardry for Legacy Applications
Database Wizardry for Legacy ApplicationsGabriela Ferrara
 
PostgreSQL Materialized Views with Active Record
PostgreSQL Materialized Views with Active RecordPostgreSQL Materialized Views with Active Record
PostgreSQL Materialized Views with Active RecordDavid Roberts
 
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB
 

Similar to Fotolog.Com.Mashraqi Scaling (20)

扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区扩展世界上最大的图片Blog社区
扩展世界上最大的图片Blog社区
 
Fotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging CommunityFotolog: Scaling the World's Largest Photo Blogging Community
Fotolog: Scaling the World's Largest Photo Blogging Community
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
EEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web ApplicationsEEDC 2010. Scaling Web Applications
EEDC 2010. Scaling Web Applications
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
YoctoDB в Яндекс.Вертикалях
YoctoDB в Яндекс.ВертикаляхYoctoDB в Яндекс.Вертикалях
YoctoDB в Яндекс.Вертикалях
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
DEF CON 27 - XILING GONG PETER PI - exploiting qualcom wlan and modem over th...
DEF CON 27 - XILING GONG PETER PI - exploiting qualcom wlan and modem over th...DEF CON 27 - XILING GONG PETER PI - exploiting qualcom wlan and modem over th...
DEF CON 27 - XILING GONG PETER PI - exploiting qualcom wlan and modem over th...
 
web2py:Web development like a boss
web2py:Web development like a bossweb2py:Web development like a boss
web2py:Web development like a boss
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshyn
 
StackOverflow Architectural Overview
StackOverflow Architectural OverviewStackOverflow Architectural Overview
StackOverflow Architectural Overview
 
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
[db tech showcase Tokyo 2017] C23: Lessons from SQLite4 by SQLite.org - Richa...
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction2011 Mongo FR - MongoDB introduction
2011 Mongo FR - MongoDB introduction
 
Database Wizardry for Legacy Applications
Database Wizardry for Legacy ApplicationsDatabase Wizardry for Legacy Applications
Database Wizardry for Legacy Applications
 
PostgreSQL Materialized Views with Active Record
PostgreSQL Materialized Views with Active RecordPostgreSQL Materialized Views with Active Record
PostgreSQL Materialized Views with Active Record
 
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
 

Fotolog.Com.Mashraqi Scaling

  • 1. Scaling the World’s Largest Photo Blogging Community Farhan “Frank” Mashraqi Senior MySQL DBA Fotolog, Inc. fmashraqi@fotolog.com Credits: Warren L. Habib: CTO Olu King: Senior Systems Administrator
  • 2. Introduction  Farhan Mashraqi - Senior MySQL DBA Fotolog, Inc. - Known on PlanetMySQL as Frank Mash - Author of upcoming “Pro Ruby on Rails” by Apress  Contact - fmashraqi@fotolog.com - softwareengineer99@yahoo.com - Blog: - http://mysqldatabaseadministration.blogspot.com - http://mashraqi.com
  • 3. What is Fotolog?  Social networking - Guestbook comments - Friend/ Favorite lists - Members create “Social Capital”  “One photo a day”  Currently 25th most visited website on the Internet (Alexa)  History  http://blog.fotolog.com/
  • 5. Fotolog (Screenshot of a fotolog member page)
  • 6. Fotolog Growth  228 million member photos  2.47 billion guestbook comments  20% of members visit the site daily  24 minutes a day spent by an average user  10 guestbook comments per photo  1,000 people or more see a photo on average  7 million members and counting  “explosive growth in Europe”  Italy and Spain among the fastest- growing countries  Recently broke the 500K photos uploaded a day record  90 million page views Fotolog Flickr
  • 7. Technology  Sun  Solaris 10  MySQL  Apache  Java / Hibernate  PHP  Memcached  3Par  IBRIX  StrongMail
  • 8. MySQL at Fotolog  32 Servers Specification of servers  Four “clusters” - User - GB - PH - FF  Non-persistent connections (PHP) - Connection Pooling (Java)  Mostly MyISAM initially Later mostly converted to InnoDB  Application side table partitioning  Memcache
  • 9. Image Storage / Delivery  MySQL is used to store image metadata only - 3Par (utility storage) - Thin Provisioning - (dedicate on allocation vs. dedicate on write)  How fast growing each day?  Frequently Accessed vs. Infrequently accessed media  Third party CDN: Akamai/Panther
  • 10. Important Scalability Considerations Do you really need to have 5 nines availability? Budget Time to deploy Testing Can we afford: SPF? Not having read redundancy? User PH GB FF Not having write redundancy? User PH GB FF
  • 11. Partitioning SHARD 1 SHARD 2 SHARD 3 Table_v1 Table_v2 Table_v3 Table_v4
  • 12. Partitioning thoughts Load distribution across shards 0.00% 2.00% 4.00% 6.00% 8.00% 10.00% M A B D Z K T 0 1 2 3 7 K O Q R T V F P 8 9 G S 5 6 E H U X Y L _ A Load distribution across shards
  • 13. Ideal distribution proposed shard for load distribution 0% 2% 4% 6% 8% 10% 12% db4 db18 db19 db22 db23 db24 db25 db28 db30 db32 proposed shard for load distribution
  • 14. GB current db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write Single Point of Failure
  • 15. GB Scalability db4 db18 db22 db23 db24 db25 db26 db27 db28 db30 db32 Application Servers 4 18 22 23 24 25 26 27 28 30 32 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 SlaveMaster/DRBD
  • 16. Current Scheme for fl_db1 repl. PH Application Servers read write Slave DB2DB1 DB3 DB8 DB12 Application Servers Issuing PH Queries RTX Repl. Repl.Repl. DB7 DB9 DB15 FSW 05DHN AEK 16JOQUZ 28IP _ 39B 4C 7GLVY M DB10 DB11 DB13 DB14 DB16 29 FF. Repl.
  • 17. Proposed Scheme for PH (Write & Read) Application Servers 7 8 9 10 11 12 13 14 15 16 29 read write 00-08 09-17 18-26 27-35 36-44 45-53 54-62 63-71 72-80 81-89 90-99 TO USER CLUSTER
  • 18. AUTO-INC table lock contention SEL SEL SEL SEL SEL SEL SEL SEL SEL SEL M Y S Q L Thread concurrency SELECTs do very well with Increased concurrency. QPS: 500+ SELECT INSERT GOOD TIMES
  • 19. AUTO-INC table lock contention SEL SEL SEL SEL SEL INS INS M Y S Q L Thread concurrency As more SELECTs come, AUTO-INC lock contention Starts causing problem. WARNING SEL SEL SEL SELECT INSERT
  • 20. AUTO-INC table lock contention INS SEL INS SEL INS INS INS INS INS INS M Y S Q L Thread concurrency PROBLEM SELECT INSERT SEL SEL SEL SEL INS INS INS INS INS
  • 21. InnoDB Tablespace Structure (Simplified) PK / CLUSTERED INDEX SECONDARY INDEX PK (clustered index key) 6 byte header Links together consecutive records & used in row-level locking Clustered index contains Fields for all user-defined columns 6 byte trx id 7 byte roll pointer 6 byte row id If no PK or UNIQUE NOT NULL defined Record Directory Array of Pointers to each field of the record 1 byte: If the total length of fields in record is 128 bytes 2 bytes: otherwise Data part of record
  • 22. InnoDB Index Structure (Simplified) DATA PAGE PK INDEX / CLUSTERED INDEX SECONDARY INDEX PK ROW DATA PK
  • 23. Old Schema  CREATE TABLE `guestbook_v3` ( `identifier` bigint(20) unsigned NOT NULL auto_increment, `user_name` varchar(16) NOT NULL default '', `photo_identifier` bigint(20) unsigned NOT NULL default '0', `posted` datetime NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (`identifier`), KEY `guestbook_photo_id_posted_idx` (`photo_identifier`,`posted`) ) ENGINE=MyISAM
  • 24. Reads Data pages • Data ordered by Identifier (PK) • Looked up by secondary key
  • 25. New Schema  CREATE TABLE `guestbook_v4` ( `identifier` int(9) unsigned NOT NULL auto_increment, `user_name` varchar(16) NOT NULL default '', `photo_identifier` int(9) unsigned NOT NULL default '0', `posted` timestamp NOT NULL default '0000-00-00 00:00:00', … PRIMARY KEY (`photo_identifier`,`posted`,`identifier`), KEY `identifier` (`identifier`) ) ENGINE=InnoDB 1 row in set (7.64 sec)
  • 26. Pending preads (Optimizing Disk Usage) Data pages • Data ordered by composite key consisting of photo_identifier (FK) • Looked up by primary key • Very low read requests per second
  • 27. Pending reads / writes / Proposed Throughput not as important as number of requests
  • 28. Pending reads / writes / Proposed
  • 30. MySQL Performance Challenges  Finding the source of problem  Mostly disk bound in mature systems  Is the query cache hurting you?  RAM addition helps dodge the bullet  Disk striping  Restructuring tables for optimal performance  LD_PRELOAD_64 = /usr/lib/sparcv9/libumem.so
  • 31. Considerations for future growth  SQLite?  File system?  PostgreSQL?  Make application better and optimize tables?
  • 32. Things to remember  Know the problem  Know your application  Know your storage engine  Know your requirements  Know your budget