SlideShare a Scribd company logo
1 of 41
Download to read offline
Petabytes and Nanoseconds 
Distributed Data Storage andthe CAP Theorem 
FIN talk 
Robert Greiner 
Nathan Murray 
August 21,2014
CHAPTER 
The Problems 
Your phone can add two numbers in the same time it takes light to travel one foot 
All high frequency trading servers are connected to the NASDAQ network with the same length of cable, so that no party has a speed advantage
A Common Scenario 
Web 
Application 
RDBMS 
+ =
The Solution: Scale All the Things!!1
Why shouldwe scale? 
Throughput 
Latency 
Storage 
Reliability
The Solution? 
Add a load balancer 
Add more web servers 
Tune the DB. Indexes,SPs, etc.
There’sa new bottleneck 
Generally an RDBMS can becomea bottleneck around 10K transactions per second
Next Step… Distribute Your Data 
Each web server can talk to any data storage node 
Nodes distribute queries and replicate data – lots more complexity!
Cluster = Additional Complexity
Enter the CAP Theorem! 
This guy created the CAP Theorem 
This guy’s 
VP Invented the internet
CAP Theorem: Defined 
Within a distributed system, you can only make two of the following three guarantees across a write/read pair
Guarantee 1: Consistency 
If a value is written, and then fetched, I will alwaysget back the new value 
Note: not the same as the C in ACID! 
_
Guarantee 2: Availability 
If a value is written, a success message should always be returned. If a subsequent read returns a stale value, or something reasonable, it’s OK. 
_ 
Note: not the same as the A in HA!
Guarantee 3: Partition Tolerance 
The system will continue to function when network partitions occur –OOP != NP. 
_ 
Note: nothing to do with BAC!
CAP Triangle 
The CAP Theorem is explained as a triangle 
C, A or P: Pick two 
This is true in practice, except…
When choosing a distributed system… 
vs.
… You Can’t Sacrifice Partition Tolerance! 
NOTDistributed 
(a.k.a. NOTPartition Tolerant) 
Available 
AND 
Consistent 
Distributed 
(a.k.a. Partition Tolerant) 
Available 
OR 
Consistent 
_ 
_
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always.
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always. 
At a bank, you get a deposit receipt afterthe work is complete 
At a coffee shop, you get a receipt beforethe work is complete
CHAPTER 
Whendo companies care?
Companies care about internetscale
Distributed Storage Past 
2004 
Google’s Map Reduce paper published 
2006 
Google’s Big Table paper published 
2007 
Amazon’s Dynamo paper published 
2008 
Yahoo runs search on Hadoop 
2008 
Facebook open sources Cassandra 
2008 
Bitcoin paper published 
2009 
Yahoo open sources Hadoop 
2010 
Azure Table Storage released 
2012 
Google’s Spanner and F1 papers 
2013 
Amazon releases DynamoDB inside AWS 
2014 
Google’s Mesa paper published 
2015 
????
Looking forward 
•Open source implementations of more sophisticated storage systems 
•Managed services with more advanced capabilities 
•Google Cloud versions of F1, Spanner, or Mesa? 
•NoSQL + SQL 
•Distributed data storage in untrusted environments
CHAPTER 
How does this affect me
Even our most “legacy” clients are already starting to care about internet scale: 
_
Scenario 
Client = Energy Retailer (Independent Sales Force) 
Sales Agent captures info about potential customer 
Price generated on-demand based on daily rate curve 
Quote no longer valid at midnight 
Each night, rates are updated based on new rate-curve 
Used to take 4hours 
Now takes > 24hours (Due to increased demand)
Current State
Solution Strategy 
Assess 
•Analyze business performance needs 
•Select non-performing work streams 
•Filter –(Could/Should) 
•Prioritize 
•Performance Baseline / Load Test 
Strategize 
•Identify Bottlenecks (CPU/RAM/Network) 
•Optimization strategy 
•Technology Selection 
Implement 
•POC 
•Load Test 
•Optimize 
•Build
Optimize Code 
Scale Up 
Scale Out 
Managed Service
Optimize CodeLevel 1 
Least organizational impact 
No architecture changes required 
Use existing development processes 
Risky –Code may be fine 
Expensive –Dev Resources 
Time Consuming –Dev + Deploy
Scale UpLevel 2 
Easiest solution 
Utilize existing infrastructure 
Little/no architecture changes 
Low probability of network partitions 
May not solve the problem long-term 
Hardware limitations 
Non-linear improvement (2x RAM != 2x Performance) 
C/A
Scale OutLevel 3 
Highest throughput 
Improved system up-time 
No single point of failure 
Linear performance increases 
Use commodity hardware –Hard to scale-up CPU 
Increased infrastructure / system complexity 
Increased probability of network partitions 
Automation complexity 
A/C
Managed ServiceLevel 4 
Low barrier to entry 
No additional hardware investment required 
Treat as extension of existing data center 
Appliance configuration 
Globally redundant (cloud) 
Most organizational change 
Less control and customization 
Built-in redundancy and innovation 
C/A 
A/C
Optimize Code(Level 1) 
•Least organizational impact 
•No architecture changes required 
•Use existing development processes 
•Risky –Code may be fine 
•Expensive –Dev Resources 
•Time Consuming –Dev + Deploy 
Scale Up(Level 2) 
•Easiest solution 
•Utilize existing infrastructure 
•Little/no architecture changes 
•Reduce probability of network partitions 
•May not solve the problem long-term 
•Hardware limitations 
•Non-linear improvement 
Scale Out(Level 3) 
•Highest throughput 
•Improved system up-time 
•No single point of failure 
•Linear performance inc. 
•Use commodity hardware 
•Increased infrastructure / system complexity 
•Increased probability of network partitions 
•Automation complexity 
Managed Service(Level 4) 
•Low barrier to entry 
•No additional hardware investment required 
•Treat as extension of existing data center 
•Appliance configuration 
•Globally redundant (cloud) 
•Most organizational change 
•Less control and customization 
•High innovation 
Pick One (Or More!)
First Attempt
Good Enough?
Taking It to the Next Level
The Best Solution?
What Would YOUDo?
Fin’ 
robert.greiner@parivedasolutions.com 
nathan.murray@parivedasolutions.com

More Related Content

What's hot

Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Amazon Web Services
 
Keeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveKeeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveAmazon Web Services
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overviewCenk Ersoy
 
Introduction to RightScale
Introduction to RightScaleIntroduction to RightScale
Introduction to RightScaleAkelios
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale
 
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMTrends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMCodemotion Tel Aviv
 
How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeDavid Linthicum
 
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentHow We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentAmazon Web Services
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Tom Laszewski
 
Best Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageBest Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageRyan Snell
 
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAmazon Web Services
 
3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero 3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero Amazon Web Services
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...Amazon Web Services
 
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Amazon Web Services
 
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...Amazon Web Services
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Codit
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsAmazon Web Services
 
Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureDavid Chou
 

What's hot (20)

Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...
 
Keeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveKeeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand Curve
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overview
 
Introduction to RightScale
Introduction to RightScaleIntroduction to RightScale
Introduction to RightScale
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
 
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMTrends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
 
How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First Time
 
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentHow We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Best Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageBest Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash Storage
 
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
 
3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero 3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
 
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
 
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
 
AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services Architecture
 

Viewers also liked

Fin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsFin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsRobert Greiner
 
Test Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetTest Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetRobert Greiner
 
Code Quality and Tipster
Code Quality and TipsterCode Quality and Tipster
Code Quality and TipsterRobert Greiner
 
Automated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDEAutomated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDERobert Greiner
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web ServicesRobert Greiner
 

Viewers also liked (6)

Fin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsFin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIs
 
Testing javascript
Testing javascriptTesting javascript
Testing javascript
 
Test Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetTest Driven Development at 10,000 Feet
Test Driven Development at 10,000 Feet
 
Code Quality and Tipster
Code Quality and TipsterCode Quality and Tipster
Code Quality and Tipster
 
Automated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDEAutomated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDE
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 

Similar to Petabytes and Nanoseconds

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Migration to the cloud
Migration to the cloudMigration to the cloud
Migration to the cloudEPAM Systems
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayBest Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayRandy Shoup
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailabilitywebuploader
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestRodolfo Kohn
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And ScalabilityJason Ragsdale
 
Telehouse Enhanced Connect slide share
Telehouse Enhanced Connect  slide shareTelehouse Enhanced Connect  slide share
Telehouse Enhanced Connect slide shareTelehouse Europe
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the CloudRightScale
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?Sargun Dhillon
 

Similar to Petabytes and Nanoseconds (20)

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Migration to the cloud
Migration to the cloudMigration to the cloud
Migration to the cloud
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayBest Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBay
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailability
 
NoSQL
NoSQLNoSQL
NoSQL
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And Scalability
 
Telehouse Enhanced Connect slide share
Telehouse Enhanced Connect  slide shareTelehouse Enhanced Connect  slide share
Telehouse Enhanced Connect slide share
 
Introduction
IntroductionIntroduction
Introduction
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the Cloud
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 

More from Robert Greiner

Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Robert Greiner
 
Virtual Team Best Practices
Virtual Team Best PracticesVirtual Team Best Practices
Virtual Team Best PracticesRobert Greiner
 
Becoming the Ideal Team Player
Becoming the Ideal Team PlayerBecoming the Ideal Team Player
Becoming the Ideal Team PlayerRobert Greiner
 
POV - Practical Containerization
POV - Practical ContainerizationPOV - Practical Containerization
POV - Practical ContainerizationRobert Greiner
 
POV - Enterprise Security Canvas
POV - Enterprise Security CanvasPOV - Enterprise Security Canvas
POV - Enterprise Security CanvasRobert Greiner
 
Foundations of financial independence
Foundations of financial independenceFoundations of financial independence
Foundations of financial independenceRobert Greiner
 
Why feedback is important
Why feedback is importantWhy feedback is important
Why feedback is importantRobert Greiner
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as CodeRobert Greiner
 
Introduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesIntroduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesRobert Greiner
 

More from Robert Greiner (9)

Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
 
Virtual Team Best Practices
Virtual Team Best PracticesVirtual Team Best Practices
Virtual Team Best Practices
 
Becoming the Ideal Team Player
Becoming the Ideal Team PlayerBecoming the Ideal Team Player
Becoming the Ideal Team Player
 
POV - Practical Containerization
POV - Practical ContainerizationPOV - Practical Containerization
POV - Practical Containerization
 
POV - Enterprise Security Canvas
POV - Enterprise Security CanvasPOV - Enterprise Security Canvas
POV - Enterprise Security Canvas
 
Foundations of financial independence
Foundations of financial independenceFoundations of financial independence
Foundations of financial independence
 
Why feedback is important
Why feedback is importantWhy feedback is important
Why feedback is important
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as Code
 
Introduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesIntroduction to Windows Azure Data Services
Introduction to Windows Azure Data Services
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Petabytes and Nanoseconds

  • 1. Petabytes and Nanoseconds Distributed Data Storage andthe CAP Theorem FIN talk Robert Greiner Nathan Murray August 21,2014
  • 2. CHAPTER The Problems Your phone can add two numbers in the same time it takes light to travel one foot All high frequency trading servers are connected to the NASDAQ network with the same length of cable, so that no party has a speed advantage
  • 3. A Common Scenario Web Application RDBMS + =
  • 4. The Solution: Scale All the Things!!1
  • 5. Why shouldwe scale? Throughput Latency Storage Reliability
  • 6. The Solution? Add a load balancer Add more web servers Tune the DB. Indexes,SPs, etc.
  • 7. There’sa new bottleneck Generally an RDBMS can becomea bottleneck around 10K transactions per second
  • 8. Next Step… Distribute Your Data Each web server can talk to any data storage node Nodes distribute queries and replicate data – lots more complexity!
  • 9. Cluster = Additional Complexity
  • 10. Enter the CAP Theorem! This guy created the CAP Theorem This guy’s VP Invented the internet
  • 11. CAP Theorem: Defined Within a distributed system, you can only make two of the following three guarantees across a write/read pair
  • 12. Guarantee 1: Consistency If a value is written, and then fetched, I will alwaysget back the new value Note: not the same as the C in ACID! _
  • 13. Guarantee 2: Availability If a value is written, a success message should always be returned. If a subsequent read returns a stale value, or something reasonable, it’s OK. _ Note: not the same as the A in HA!
  • 14. Guarantee 3: Partition Tolerance The system will continue to function when network partitions occur –OOP != NP. _ Note: nothing to do with BAC!
  • 15. CAP Triangle The CAP Theorem is explained as a triangle C, A or P: Pick two This is true in practice, except…
  • 16. When choosing a distributed system… vs.
  • 17. … You Can’t Sacrifice Partition Tolerance! NOTDistributed (a.k.a. NOTPartition Tolerant) Available AND Consistent Distributed (a.k.a. Partition Tolerant) Available OR Consistent _ _
  • 18. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always.
  • 19. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always. At a bank, you get a deposit receipt afterthe work is complete At a coffee shop, you get a receipt beforethe work is complete
  • 21. Companies care about internetscale
  • 22. Distributed Storage Past 2004 Google’s Map Reduce paper published 2006 Google’s Big Table paper published 2007 Amazon’s Dynamo paper published 2008 Yahoo runs search on Hadoop 2008 Facebook open sources Cassandra 2008 Bitcoin paper published 2009 Yahoo open sources Hadoop 2010 Azure Table Storage released 2012 Google’s Spanner and F1 papers 2013 Amazon releases DynamoDB inside AWS 2014 Google’s Mesa paper published 2015 ????
  • 23. Looking forward •Open source implementations of more sophisticated storage systems •Managed services with more advanced capabilities •Google Cloud versions of F1, Spanner, or Mesa? •NoSQL + SQL •Distributed data storage in untrusted environments
  • 24. CHAPTER How does this affect me
  • 25. Even our most “legacy” clients are already starting to care about internet scale: _
  • 26.
  • 27. Scenario Client = Energy Retailer (Independent Sales Force) Sales Agent captures info about potential customer Price generated on-demand based on daily rate curve Quote no longer valid at midnight Each night, rates are updated based on new rate-curve Used to take 4hours Now takes > 24hours (Due to increased demand)
  • 29. Solution Strategy Assess •Analyze business performance needs •Select non-performing work streams •Filter –(Could/Should) •Prioritize •Performance Baseline / Load Test Strategize •Identify Bottlenecks (CPU/RAM/Network) •Optimization strategy •Technology Selection Implement •POC •Load Test •Optimize •Build
  • 30. Optimize Code Scale Up Scale Out Managed Service
  • 31. Optimize CodeLevel 1 Least organizational impact No architecture changes required Use existing development processes Risky –Code may be fine Expensive –Dev Resources Time Consuming –Dev + Deploy
  • 32. Scale UpLevel 2 Easiest solution Utilize existing infrastructure Little/no architecture changes Low probability of network partitions May not solve the problem long-term Hardware limitations Non-linear improvement (2x RAM != 2x Performance) C/A
  • 33. Scale OutLevel 3 Highest throughput Improved system up-time No single point of failure Linear performance increases Use commodity hardware –Hard to scale-up CPU Increased infrastructure / system complexity Increased probability of network partitions Automation complexity A/C
  • 34. Managed ServiceLevel 4 Low barrier to entry No additional hardware investment required Treat as extension of existing data center Appliance configuration Globally redundant (cloud) Most organizational change Less control and customization Built-in redundancy and innovation C/A A/C
  • 35. Optimize Code(Level 1) •Least organizational impact •No architecture changes required •Use existing development processes •Risky –Code may be fine •Expensive –Dev Resources •Time Consuming –Dev + Deploy Scale Up(Level 2) •Easiest solution •Utilize existing infrastructure •Little/no architecture changes •Reduce probability of network partitions •May not solve the problem long-term •Hardware limitations •Non-linear improvement Scale Out(Level 3) •Highest throughput •Improved system up-time •No single point of failure •Linear performance inc. •Use commodity hardware •Increased infrastructure / system complexity •Increased probability of network partitions •Automation complexity Managed Service(Level 4) •Low barrier to entry •No additional hardware investment required •Treat as extension of existing data center •Appliance configuration •Globally redundant (cloud) •Most organizational change •Less control and customization •High innovation Pick One (Or More!)
  • 38. Taking It to the Next Level