SlideShare a Scribd company logo
1 of 24
Scaling databases on the cloud

                                                                  D e e p a k A n u p a l l i
                                                                  S e r v e r A r c h i t e c t

                                 C L O U D               C O M P U T I N G - C O M I N G                          O F    A G E

                             A      T R E A T I S E                    O N         R E A L - L I F E        U S E       C A S E S




Copyright (c) 2009, Pramati Technologies Private Limited. Imaginea is a Pramati business. All
trade names and trade marks are owned by their respective owners
                                                                                                11/4/2009     1
We are
 •   An emerging leader in product
     development services offering
     specialized services in Product
     Engineering, Interaction design
     and Test engineering.
 •   US Headquarters in Sunnyvale,
     CA; India development centers in
     Hyderabad and Chennai
 •   A 250+ strong and growing team
 •   A business unit of Pramati
     technologies
 •   Rich Experience in SaaS
     Engineering, Performance
     engineering, Cloud Computing,
     Web2.0, sf.com integrations and
     managing Amazon EC2
     Deployment
 •   Track record of delivering
     significant customer satisfaction
Initiatives in Cloud
• Dekoh:
  http://www.dekoh.com
• SocialTwist:
  http://www.socialtwist.com
• MyPicks Beijing 2008:
  http://apps.new.facebook.com/mypicksbeijing/Home
• Qontext:
  http://www.qontext.com
Application requirements

• High reliability
• Low Latency
• Dynamic Scalability
   – Millions of Users
   – Volumes of data
• Across the tiers
   – Web
   – Application
   – Data
Our biggest challenge

• DB Perf bound by Disk I/O
• Vertical scaling is an option
   – Ex: PlentyOfFish.com: 512GB RAM, 32CPUs
   – Expensive
  – Only possible to an extent on cloud servers
Vertical Scaling: Limitations
  • Not everything will fit in
    memory
  • Lot of reads ~ Lot of
    page faults + disk seeks
  • RAID 6 or RAID 10
    disks
  • 200MBps-1GBps is the
    max speed

         Think Horizontal !
Replication
 • Master-slave replication (MySQL
                                             Writes
   or Oracle RAC)
 • Writes on one Master
                                             Master
 • Reads on many Slaves
 • Application aware
 • Works in read mostly scenario             Writes

 • Adds Slave lag
                                     Slave   Slave    Slave


                                              Reads
Sharding
 • Partition data across masters
 • Writes and Reads are distributed                  Shard Logic
 • Application is modified accordingly
 • Also use replication with fewer slaves
   to minimize slave lag                    Master      Master     Master

 • Choose a partitioning strategy that
   uniformly distributes data

                                            Slave       Slave      Slave
Sharding Schemes
 •   Vertical
                                   shard_id = getShard(“profile”)
 •   Profile DB, friend DB         shard_id = getShard(profileID)
 •   Not uniform
                                   Select * from Profile where id = ?
 •   Range based
 •   ID range, Location or Date
     based
 •   Not uniform                     Corporate           Corporate

 •   Key or Hash based
 •   ID hash
 •   Fixed masters
                                  Tweets         Posts
 •   Directory
 •   Mapping of ID to Shard
 •   Single point of failure
Sharding Complexities
 •   No Joins
 •   De-normalize the data
 •   Data Integrity
 •   Application should enforce integrity
 •   Re-shard
 •   Changing the sharding scheme requires re-partitioning
     the entire data
De-normalization
 • Recent 10 messages to a recipient
 • Schema                                   Messages    Recipients
 • Messages Table stores message info
                                            timestamp
 • Recipients Table stores
 • Requires Join on Messages & Recipients
   table
 • De-normalize                             Messages    Recipients

 • Store timestamp in Recipients table as
                                            timestamp   timestamp
   well
Relationships

• When data is partitioned into shards,
  foreign keys become obsolete
• De-normalization avoids having
  relationships                                      Application
• If data can’t be de-normalized further,
  use memcached
• But, this requires change in SQL queries      MemCached


                                             Shard    Shard    Shard
                                               1        2        3
Cloud Databases/Data stores

•   Amazon SimpleDB
•   Google BigTable
•   Apache HBase
•   Facebook/Apache Hive
•   CouchDB
•   Cassandra
•   Many more…
Amazon SimpleDB
•   Schema-less distributed key-value store
•   Highly reliable and scalable
•   Automatic indexing of columns
•   Querying with SQL-like syntax
•   Supports multiple values for key/attribute
•   Value for Money
Problems Addressed
• High Availability
   – multiple nodes forming a ring
• Partitioning
   – Consistent hashing
• Replication
   – Replicated to multiple nodes
• Eventual Consistency
   – Asynchronous replication of data using vector clocks
SimpleDB adoption

•   No Joins
•   No transactional support
•   String is the only data type
•   No aggregator functions
•   No full-text searches
•   Limits enforced on size of results, predicates, data etc.
Google BigTable
•   Distributed Key-value store
•   Runs on top of Google File System (GFS)
•   Timestamp versioned data
•   Automatic indexing of columns
BigTable adoption
• Google Search, Maps, Earth, Orkut, Youtube,
  Reader, etc.
• Google App Engine(GAE) uses BigTable as its
  datastore
• DataNucleus supports JPA for BigTable
• Limited transaction support
• Eventual consistency
Hive
 • Hive is a data warehouse
 • Runs on top of Hadoop Distributed
   File system (HDFS)
 • Supports SQL-like syntax
 • User defined types and functions
 • Extensibility with Map-Reduce
Hive adoption
 • Facebook uses Hive to analyze historical
   data of users and content
 • Doesn’t support indexing of columns
 • Brute force mechanism to compute
   analytics
CouchDB
•   CouchDB is a document-oriented datastore
•   Schema-free
•   Accessible through RESTful JSON API
•   Distributed with incremental replication
•   Querying through Javascript
Is there a solution for all?


• Different data-stores address different problem spaces
• Identify what best suites your app
Thank You
   deepak@pramati.com



http://hysea.in
C L O U D               C O M P U T I N G - C O M I N G                                      O F      A G E

A     T R E A T I S E                    O N        R E A L - L I F E                       U S E     C A S E S



Scaling databases on the cloud



Copyright © 2009, Imaginea Inc. Not to be distributed or communicated without permission.           11/4/2009   24

More Related Content

What's hot

Geek Sync | Designing Data Intensive Cloud Native Applications
Geek Sync | Designing Data Intensive Cloud Native ApplicationsGeek Sync | Designing Data Intensive Cloud Native Applications
Geek Sync | Designing Data Intensive Cloud Native ApplicationsIDERA Software
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Cloudera, Inc.
 
2015 GHC Presentation - High Availability and High Frequency Big Data Analytics
2015 GHC Presentation - High Availability and High Frequency Big Data Analytics2015 GHC Presentation - High Availability and High Frequency Big Data Analytics
2015 GHC Presentation - High Availability and High Frequency Big Data AnalyticsEsther Kundin
 
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...Cloudera, Inc.
 
That ORM is Lying to You
That ORM is Lying to YouThat ORM is Lying to You
That ORM is Lying to YouRonen Botzer
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 StepsScott Cinnamond
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooAndrew Brust
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop ToolsXplenty
 
Big data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaBig data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaKaushik Dutta
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaData Science London
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopCloudera, Inc.
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 

What's hot (20)

Geek Sync | Designing Data Intensive Cloud Native Applications
Geek Sync | Designing Data Intensive Cloud Native ApplicationsGeek Sync | Designing Data Intensive Cloud Native Applications
Geek Sync | Designing Data Intensive Cloud Native Applications
 
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
Hadoop World 2011: Building Realtime Big Data Services at Facebook with Hadoo...
 
2015 GHC Presentation - High Availability and High Frequency Big Data Analytics
2015 GHC Presentation - High Availability and High Frequency Big Data Analytics2015 GHC Presentation - High Availability and High Frequency Big Data Analytics
2015 GHC Presentation - High Availability and High Frequency Big Data Analytics
 
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
HBaseCon 2012 | Getting Real about Interactive Big Data Management with Lily ...
 
That ORM is Lying to You
That ORM is Lying to YouThat ORM is Lying to You
That ORM is Lying to You
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 Steps
 
NoSQL and The Big Data Hullabaloo
NoSQL and The Big Data HullabalooNoSQL and The Big Data Hullabaloo
NoSQL and The Big Data Hullabaloo
 
12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools12 SQL On-Hadoop Tools
12 SQL On-Hadoop Tools
 
Big data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaBig data Intro by Kaushik Dutta
Big data Intro by Kaushik Dutta
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014HBase Status Report - Hadoop Summit Europe 2014
HBase Status Report - Hadoop Summit Europe 2014
 
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Impala: Real-time Queries in Hadoop
Impala: Real-time Queries in HadoopImpala: Real-time Queries in Hadoop
Impala: Real-time Queries in Hadoop
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
NoSQL
NoSQLNoSQL
NoSQL
 

Viewers also liked

Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas KashalikarVruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas KashalikarAbhishek Yelgalwar
 
A G I N G G R A C E F U L L Y A N D V I C T O R I U O S L Y D R S H R I ...
A G I N G  G R A C E F U L L Y  A N D  V I C T O R I U O S L Y  D R  S H R I ...A G I N G  G R A C E F U L L Y  A N D  V I C T O R I U O S L Y  D R  S H R I ...
A G I N G G R A C E F U L L Y A N D V I C T O R I U O S L Y D R S H R I ...Abhishek Yelgalwar
 
D E M O C R A C Y & S T R E S S M A N A G E M E N T D R S H R I N I W A S...
D E M O C R A C Y &  S T R E S S  M A N A G E M E N T  D R  S H R I N I W A S...D E M O C R A C Y &  S T R E S S  M A N A G E M E N T  D R  S H R I N I W A S...
D E M O C R A C Y & S T R E S S M A N A G E M E N T D R S H R I N I W A S...Abhishek Yelgalwar
 
Imaginea_Product Engineering_Services
Imaginea_Product Engineering_ServicesImaginea_Product Engineering_Services
Imaginea_Product Engineering_ServicesImaginea
 
W H Y H O L I S T I C M E D I C I N E D R S H R I N I W A S K A S H A L ...
W H Y  H O L I S T I C  M E D I C I N E  D R  S H R I N I W A S  K A S H A L ...W H Y  H O L I S T I C  M E D I C I N E  D R  S H R I N I W A S  K A S H A L ...
W H Y H O L I S T I C M E D I C I N E D R S H R I N I W A S K A S H A L ...ghanyog
 

Viewers also liked (6)

Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas KashalikarVruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
Vruddha Trihi Samruddha Bestseller For Sexy Aging Dr. Shriniwas Kashalikar
 
A G I N G G R A C E F U L L Y A N D V I C T O R I U O S L Y D R S H R I ...
A G I N G  G R A C E F U L L Y  A N D  V I C T O R I U O S L Y  D R  S H R I ...A G I N G  G R A C E F U L L Y  A N D  V I C T O R I U O S L Y  D R  S H R I ...
A G I N G G R A C E F U L L Y A N D V I C T O R I U O S L Y D R S H R I ...
 
D E M O C R A C Y & S T R E S S M A N A G E M E N T D R S H R I N I W A S...
D E M O C R A C Y &  S T R E S S  M A N A G E M E N T  D R  S H R I N I W A S...D E M O C R A C Y &  S T R E S S  M A N A G E M E N T  D R  S H R I N I W A S...
D E M O C R A C Y & S T R E S S M A N A G E M E N T D R S H R I N I W A S...
 
Imaginea_Product Engineering_Services
Imaginea_Product Engineering_ServicesImaginea_Product Engineering_Services
Imaginea_Product Engineering_Services
 
P R A L H A D S A I D D R
P R A L H A D  S A I D  D RP R A L H A D  S A I D  D R
P R A L H A D S A I D D R
 
W H Y H O L I S T I C M E D I C I N E D R S H R I N I W A S K A S H A L ...
W H Y  H O L I S T I C  M E D I C I N E  D R  S H R I N I W A S  K A S H A L ...W H Y  H O L I S T I C  M E D I C I N E  D R  S H R I N I W A S  K A S H A L ...
W H Y H O L I S T I C M E D I C I N E D R S H R I N I W A S K A S H A L ...
 

Similar to Scaling Databases On The Cloud

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?Amazon Web Services
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFAmazon Web Services
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoAmazon Web Services
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Manik Surtani
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data ModelingAdam Doyle
 
AWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudDataAWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudDataWeCloudData
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended CutWes McKinney
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...Simon Ambridge
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Spark Summit
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceAbdelmonaim Remani
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud applicationNoam Sheffer
 

Similar to Scaling Databases On The Cloud (20)

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Which Database is Right for My Workload?
Which Database is Right for My Workload?Which Database is Right for My Workload?
Which Database is Right for My Workload?
 
Which Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SFWhich Database is Right for My Workload: Database Week SF
Which Database is Right for My Workload: Database Week SF
 
Which Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San FranciscoWhich Database is Right for My Workload?: Database Week San Francisco
Which Database is Right for My Workload?: Database Week San Francisco
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
NoSQL-Overview
NoSQL-OverviewNoSQL-Overview
NoSQL-Overview
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Hadoop Data Modeling
Hadoop Data ModelingHadoop Data Modeling
Hadoop Data Modeling
 
AWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudDataAWS Well Architected-Info Session WeCloudData
AWS Well Architected-Info Session WeCloudData
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
DataFrames: The Extended Cut
DataFrames: The Extended CutDataFrames: The Extended Cut
DataFrames: The Extended Cut
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 

More from Imaginea

Web application penetration testing
Web application penetration testingWeb application penetration testing
Web application penetration testingImaginea
 
Network penetration testing
Network penetration testingNetwork penetration testing
Network penetration testingImaginea
 
Require JS
Require JSRequire JS
Require JSImaginea
 
Scala and lift
Scala and liftScala and lift
Scala and liftImaginea
 
Imaginea Service Sheet - Performance Engineering
Imaginea Service Sheet - Performance EngineeringImaginea Service Sheet - Performance Engineering
Imaginea Service Sheet - Performance EngineeringImaginea
 
Imaginea Service Sheet - Interaction Design
Imaginea Service Sheet - Interaction DesignImaginea Service Sheet - Interaction Design
Imaginea Service Sheet - Interaction DesignImaginea
 
Imaginea - SugarCRM iPhone App - User Guide
Imaginea - SugarCRM iPhone App - User GuideImaginea - SugarCRM iPhone App - User Guide
Imaginea - SugarCRM iPhone App - User GuideImaginea
 
Offline Enterprise and Web Apps: Dekoh Approach
Offline Enterprise and Web Apps: Dekoh ApproachOffline Enterprise and Web Apps: Dekoh Approach
Offline Enterprise and Web Apps: Dekoh ApproachImaginea
 
Imaginea Scales Application using Amazon EC2
Imaginea Scales Application using Amazon EC2Imaginea Scales Application using Amazon EC2
Imaginea Scales Application using Amazon EC2Imaginea
 
Whitepaper Cloud Egovernance Imaginea
Whitepaper Cloud Egovernance ImagineaWhitepaper Cloud Egovernance Imaginea
Whitepaper Cloud Egovernance ImagineaImaginea
 
Imaginea - Ideas to Life - About Us
Imaginea - Ideas to Life - About UsImaginea - Ideas to Life - About Us
Imaginea - Ideas to Life - About UsImaginea
 
Imaginea_CloudComputing_Services
Imaginea_CloudComputing_ServicesImaginea_CloudComputing_Services
Imaginea_CloudComputing_ServicesImaginea
 
Imaginea Cloud Offerings
Imaginea Cloud OfferingsImaginea Cloud Offerings
Imaginea Cloud OfferingsImaginea
 
Soa Offerings
Soa OfferingsSoa Offerings
Soa OfferingsImaginea
 
Sharing on Dekoh - Our RIA Desktop Platform
Sharing on Dekoh - Our RIA Desktop PlatformSharing on Dekoh - Our RIA Desktop Platform
Sharing on Dekoh - Our RIA Desktop PlatformImaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
Product QA - A test engineering perspective
Product QA - A test engineering perspectiveProduct QA - A test engineering perspective
Product QA - A test engineering perspectiveImaginea
 
Facebook Olympics
Facebook OlympicsFacebook Olympics
Facebook OlympicsImaginea
 
Process Guidelines V2
Process Guidelines V2Process Guidelines V2
Process Guidelines V2Imaginea
 
Migrating to Cloud - A Step by Step
Migrating to Cloud - A Step by Step Migrating to Cloud - A Step by Step
Migrating to Cloud - A Step by Step Imaginea
 

More from Imaginea (20)

Web application penetration testing
Web application penetration testingWeb application penetration testing
Web application penetration testing
 
Network penetration testing
Network penetration testingNetwork penetration testing
Network penetration testing
 
Require JS
Require JSRequire JS
Require JS
 
Scala and lift
Scala and liftScala and lift
Scala and lift
 
Imaginea Service Sheet - Performance Engineering
Imaginea Service Sheet - Performance EngineeringImaginea Service Sheet - Performance Engineering
Imaginea Service Sheet - Performance Engineering
 
Imaginea Service Sheet - Interaction Design
Imaginea Service Sheet - Interaction DesignImaginea Service Sheet - Interaction Design
Imaginea Service Sheet - Interaction Design
 
Imaginea - SugarCRM iPhone App - User Guide
Imaginea - SugarCRM iPhone App - User GuideImaginea - SugarCRM iPhone App - User Guide
Imaginea - SugarCRM iPhone App - User Guide
 
Offline Enterprise and Web Apps: Dekoh Approach
Offline Enterprise and Web Apps: Dekoh ApproachOffline Enterprise and Web Apps: Dekoh Approach
Offline Enterprise and Web Apps: Dekoh Approach
 
Imaginea Scales Application using Amazon EC2
Imaginea Scales Application using Amazon EC2Imaginea Scales Application using Amazon EC2
Imaginea Scales Application using Amazon EC2
 
Whitepaper Cloud Egovernance Imaginea
Whitepaper Cloud Egovernance ImagineaWhitepaper Cloud Egovernance Imaginea
Whitepaper Cloud Egovernance Imaginea
 
Imaginea - Ideas to Life - About Us
Imaginea - Ideas to Life - About UsImaginea - Ideas to Life - About Us
Imaginea - Ideas to Life - About Us
 
Imaginea_CloudComputing_Services
Imaginea_CloudComputing_ServicesImaginea_CloudComputing_Services
Imaginea_CloudComputing_Services
 
Imaginea Cloud Offerings
Imaginea Cloud OfferingsImaginea Cloud Offerings
Imaginea Cloud Offerings
 
Soa Offerings
Soa OfferingsSoa Offerings
Soa Offerings
 
Sharing on Dekoh - Our RIA Desktop Platform
Sharing on Dekoh - Our RIA Desktop PlatformSharing on Dekoh - Our RIA Desktop Platform
Sharing on Dekoh - Our RIA Desktop Platform
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Product QA - A test engineering perspective
Product QA - A test engineering perspectiveProduct QA - A test engineering perspective
Product QA - A test engineering perspective
 
Facebook Olympics
Facebook OlympicsFacebook Olympics
Facebook Olympics
 
Process Guidelines V2
Process Guidelines V2Process Guidelines V2
Process Guidelines V2
 
Migrating to Cloud - A Step by Step
Migrating to Cloud - A Step by Step Migrating to Cloud - A Step by Step
Migrating to Cloud - A Step by Step
 

Recently uploaded

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 

Recently uploaded (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 

Scaling Databases On The Cloud

  • 1. Scaling databases on the cloud D e e p a k A n u p a l l i S e r v e r A r c h i t e c t C L O U D C O M P U T I N G - C O M I N G O F A G E A T R E A T I S E O N R E A L - L I F E U S E C A S E S Copyright (c) 2009, Pramati Technologies Private Limited. Imaginea is a Pramati business. All trade names and trade marks are owned by their respective owners 11/4/2009 1
  • 2. We are • An emerging leader in product development services offering specialized services in Product Engineering, Interaction design and Test engineering. • US Headquarters in Sunnyvale, CA; India development centers in Hyderabad and Chennai • A 250+ strong and growing team • A business unit of Pramati technologies • Rich Experience in SaaS Engineering, Performance engineering, Cloud Computing, Web2.0, sf.com integrations and managing Amazon EC2 Deployment • Track record of delivering significant customer satisfaction
  • 3. Initiatives in Cloud • Dekoh: http://www.dekoh.com • SocialTwist: http://www.socialtwist.com • MyPicks Beijing 2008: http://apps.new.facebook.com/mypicksbeijing/Home • Qontext: http://www.qontext.com
  • 4. Application requirements • High reliability • Low Latency • Dynamic Scalability – Millions of Users – Volumes of data • Across the tiers – Web – Application – Data
  • 5. Our biggest challenge • DB Perf bound by Disk I/O • Vertical scaling is an option – Ex: PlentyOfFish.com: 512GB RAM, 32CPUs – Expensive – Only possible to an extent on cloud servers
  • 6. Vertical Scaling: Limitations • Not everything will fit in memory • Lot of reads ~ Lot of page faults + disk seeks • RAID 6 or RAID 10 disks • 200MBps-1GBps is the max speed Think Horizontal !
  • 7. Replication • Master-slave replication (MySQL Writes or Oracle RAC) • Writes on one Master Master • Reads on many Slaves • Application aware • Works in read mostly scenario Writes • Adds Slave lag Slave Slave Slave Reads
  • 8. Sharding • Partition data across masters • Writes and Reads are distributed Shard Logic • Application is modified accordingly • Also use replication with fewer slaves to minimize slave lag Master Master Master • Choose a partitioning strategy that uniformly distributes data Slave Slave Slave
  • 9. Sharding Schemes • Vertical shard_id = getShard(“profile”) • Profile DB, friend DB shard_id = getShard(profileID) • Not uniform Select * from Profile where id = ? • Range based • ID range, Location or Date based • Not uniform Corporate Corporate • Key or Hash based • ID hash • Fixed masters Tweets Posts • Directory • Mapping of ID to Shard • Single point of failure
  • 10. Sharding Complexities • No Joins • De-normalize the data • Data Integrity • Application should enforce integrity • Re-shard • Changing the sharding scheme requires re-partitioning the entire data
  • 11. De-normalization • Recent 10 messages to a recipient • Schema Messages Recipients • Messages Table stores message info timestamp • Recipients Table stores • Requires Join on Messages & Recipients table • De-normalize Messages Recipients • Store timestamp in Recipients table as timestamp timestamp well
  • 12. Relationships • When data is partitioned into shards, foreign keys become obsolete • De-normalization avoids having relationships Application • If data can’t be de-normalized further, use memcached • But, this requires change in SQL queries MemCached Shard Shard Shard 1 2 3
  • 13. Cloud Databases/Data stores • Amazon SimpleDB • Google BigTable • Apache HBase • Facebook/Apache Hive • CouchDB • Cassandra • Many more…
  • 14. Amazon SimpleDB • Schema-less distributed key-value store • Highly reliable and scalable • Automatic indexing of columns • Querying with SQL-like syntax • Supports multiple values for key/attribute • Value for Money
  • 15. Problems Addressed • High Availability – multiple nodes forming a ring • Partitioning – Consistent hashing • Replication – Replicated to multiple nodes • Eventual Consistency – Asynchronous replication of data using vector clocks
  • 16. SimpleDB adoption • No Joins • No transactional support • String is the only data type • No aggregator functions • No full-text searches • Limits enforced on size of results, predicates, data etc.
  • 17. Google BigTable • Distributed Key-value store • Runs on top of Google File System (GFS) • Timestamp versioned data • Automatic indexing of columns
  • 18. BigTable adoption • Google Search, Maps, Earth, Orkut, Youtube, Reader, etc. • Google App Engine(GAE) uses BigTable as its datastore • DataNucleus supports JPA for BigTable • Limited transaction support • Eventual consistency
  • 19. Hive • Hive is a data warehouse • Runs on top of Hadoop Distributed File system (HDFS) • Supports SQL-like syntax • User defined types and functions • Extensibility with Map-Reduce
  • 20. Hive adoption • Facebook uses Hive to analyze historical data of users and content • Doesn’t support indexing of columns • Brute force mechanism to compute analytics
  • 21. CouchDB • CouchDB is a document-oriented datastore • Schema-free • Accessible through RESTful JSON API • Distributed with incremental replication • Querying through Javascript
  • 22. Is there a solution for all? • Different data-stores address different problem spaces • Identify what best suites your app
  • 23. Thank You deepak@pramati.com http://hysea.in
  • 24. C L O U D C O M P U T I N G - C O M I N G O F A G E A T R E A T I S E O N R E A L - L I F E U S E C A S E S Scaling databases on the cloud Copyright © 2009, Imaginea Inc. Not to be distributed or communicated without permission. 11/4/2009 24