SlideShare a Scribd company logo
Harish Ganesan
CTO
8KMiles
2013
P1) This Presentation is
P2) Strongly Inspired by “Guy Ritchie”
Movies
P3) Disclaimer : All images are downloaded from
internet. If you find any of the content / images violating
copyright, please let me know and I will act upon it
immediately
AGENDA
• Case
• Challenge
• Solution
• Learning's
• About us
Case
Cigarette smoking is injurious to health
• Mobile Advertising company, USA
• Forbes 1000 clientele
• TB’s of unstructured data -> Big Data
Problem
Lock
• Hourly ~1 TB
• CDN Logs
• Text Files
• XML Files
• Geo data files
• Server logs
• DB records
STOCK
• Reduce the cost leakage
• How to Save $$$ ?
Challenges
• Daily (was OK), Monthly (Pain) and Historical
analysis ( almost dead )
• How do we Transfer, Store, Analyze and Share ?
• How to optimize costs at this scale ?
Solution
Cigarette smoking is injurious to health
• Use AWS Cloud for hosting Analytics module
• Amazon EMR for unstructured Log Analysis
• Automation using Scripts, Java code and other
tools
Social / 3rd
Party
Feeds/Cloud
Logs
Stage 1: Data Transfer
• Tsunami UDP
• ~1TB un compressed logs
every hour
• High bandwidth EC2’s for
Tsunami UDP
• Other Popular Options :
• Aspera
• AWS Import/Export
• WAN optimization
• AWS Direct Connect
Amazon S3
Logs
Stage 2: Storage
• Amazon Web Services Building Block
– S3
• Scalable Object Store
• Inherently Fault Tolerant
• ~2 TB of compressed logs every day
• S3 RR option for intermediate
outputs
• Amazon Glacier for archivalSocial / 3rd
Party
Feeds/Cloud
Amazon S3
Elastic
MapReduce
Logs
Stage 3: Analyze
• Elastic MapReduce
Service of Amazon
• Minimal Setup time
• Log Analysis
• ~2000 mappers /
750 reducers @
peak
• ~250 m1.xlarge
task nodes (1000
cores, 3750 GB
RAM) @ peakSocial / 3rd
Party
Feeds/Cloud
• Amazon EMR is great
• But adding Spot EC2 is super cool
Wait !!!
What is Amazon Spot ?
13
• Time-flexible, interruption-tolerant tasks
• Bid Price & Spot Price
• M1.xlarge Price Comparison
• $0.480 per Hour – On Demand
• $0.052 per Hour - Spot
• You will never pay more than your
maximum bid price per hour
•Spot Instance may be interrupted
• If interrupted you will not be charged for
any partial hour of usage. (*Free)
Spot Bidding Strategies
14
•Just above Spot Price
•Between Spot Price & On Demand
Price
•On Demand Price
•Above On Demand Price
Spot Price Variations - AZ
Amazon EMR with Spot Instance
Project Master
Instance
Group
Core Instance
Group
Task Instance
Group
Long-running
clusters
on-demand on-demand Spot
Cost-driven
workloads
spot spot Spot
Data-critical
workloads
on-demand on-demand Spot
Application
testing
spot Spot Spot
Amazon S3
Elastic
MapReduce
Social /
3rd Party
Feeds
Logs
Stage 4: Custom EMR Manager
• We created a Custom EMR
Manager
• Choose spot based on:
• Past price trend intelligence
• Choose AZ based on Current
Market Prices
• Choose between Large vs
Extra Large
• Spot Pricing Strategy :
• Set Spot Price = On Demand
Price
• Over board <20% of On
Demand Price at times
• Dynamic Sizing the Core / Task
nodes
• Dynamic EMR Cluster creationCustom EMR
Manager
Some Spot Use Cases
18
• Analytics & Big Data
• Scientific computing
• Web crawling
• Financial model and Analysis
• Testing
• Image & Media Encoding
66 % savings
50 % savings
57 % savings
Learning
• Spot + On demand EC2 is a deadly combination for cost savings
• Every millisecond matters in MR – Tune your code
• Merge Files – Bigger ones are better for processing
More Learning …
• Custom Job Manager was designed by us
• 1 File Per Mapper was better for our case in AWS
• Understand the performance constraints of AWS and
work with it
• Compress data : Both storage and transit(.LZO & Snappy)
Continues…
• Keep configuration data in local memory or Amazon
DynamoDB
• Reducers split files suitable for next job mappers
• Elasticity – Increase/Decrease Task nodes
• Elasticity – Create new EMR Clusters matching the Logs
(Core + Task)
Value
• ~56% cost savings from pure On-Demand model for Core+
Task Nodes
• Automation vastly reduced Labor cost ( initial + on going)
• Customer CXO’s were happy
• AWS Premium Partner
• Solution Experts in
• Cloud Computing
• Big Data
• Identity Management
About US
Shoot your ?
Harish@8kmiles.com
http://harish11g.blogspot.com
@harish11g
harishganesan

More Related Content

What's hot

AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?Amazon Web Services
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
Amazon Web Services
 
AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)
Amazon Web Services
 
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
Amazon Web Services
 
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
TO THE NEW | Technology
 
Aws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DRAws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DRHarish Ganesan
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAmazon Web Services
 
Aws Autoscaling
Aws AutoscalingAws Autoscaling
Aws Autoscaling
Kimberly Macias
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
Amazon Web Services
 
Auto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisAuto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and Redis
Yi Hsuan (Jeddie) Chuang
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Amazon Web Services
 
Auto Scaling with Amazon Web Services
Auto Scaling with Amazon Web ServicesAuto Scaling with Amazon Web Services
Auto Scaling with Amazon Web Services
Yi Hsuan (Jeddie) Chuang
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
Amazon Web Services
 
(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling
Amazon Web Services
 
How Does Amazon EC2 Auto Scaling Work
How Does Amazon EC2 Auto Scaling WorkHow Does Amazon EC2 Auto Scaling Work
How Does Amazon EC2 Auto Scaling Work
Intelligentia IT Systems Pvt. Ltd.
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
Amazon Web Services
 
All you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up LoftAll you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up Loft
Amazon Web Services
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
Amazon Web Services
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
Amazon Web Services
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 SpotIntroduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
Amazon Web Services
 

What's hot (20)

AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
AWS Summit London 2014 | Partners & Solutions Track | What's New at AWS?
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
 
AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)AWS Summit London 2014 | Deployment Done Right (300)
AWS Summit London 2014 | Deployment Done Right (300)
 
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
AWS Summit London 2014 | From One to Many - Evolving VPC Design (400)
 
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
(AWS) Auto Scaling : Evening Session by Amazon and IntelliGrape Software
 
Aws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DRAws 201:Advanced Breakout Track on HA and DR
Aws 201:Advanced Breakout Track on HA and DR
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
 
Aws Autoscaling
Aws AutoscalingAws Autoscaling
Aws Autoscaling
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Auto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisAuto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and Redis
 
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity PlanDay 5 - AWS Autoscaling Master Class - The New Capacity Plan
Day 5 - AWS Autoscaling Master Class - The New Capacity Plan
 
Auto Scaling with Amazon Web Services
Auto Scaling with Amazon Web ServicesAuto Scaling with Amazon Web Services
Auto Scaling with Amazon Web Services
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling(CMP201) All You Need To Know About Auto Scaling
(CMP201) All You Need To Know About Auto Scaling
 
How Does Amazon EC2 Auto Scaling Work
How Does Amazon EC2 Auto Scaling WorkHow Does Amazon EC2 Auto Scaling Work
How Does Amazon EC2 Auto Scaling Work
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
All you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up LoftAll you need to know about Auto scaling - Pop-up Loft
All you need to know about Auto scaling - Pop-up Loft
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
 
Introduction to AWS Batch
Introduction to AWS BatchIntroduction to AWS Batch
Introduction to AWS Batch
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 SpotIntroduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 

Similar to Cloud Connect 2013- Lock Stock and x Smoking EC2's

Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rasmus Ekman
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
Amazon Web Services
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Amazon Web Services Korea
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Amazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
Amazon Web Services
 
AWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS EconomicsAWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS Economics
Aaron Klein
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
Amazon Web Services
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Julien SIMON
 
AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost OptimizationMiles Ward
 
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Web Services
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
Amazon Web Services
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
Amazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
Amazon Web Services
 
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
Amazon Web Services
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
Amazon Web Services
 
How to Reduce your Spend on AWS
How to Reduce your Spend on AWSHow to Reduce your Spend on AWS
How to Reduce your Spend on AWS
Joseph K. Ziegler
 

Similar to Cloud Connect 2013- Lock Stock and x Smoking EC2's (20)

Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
AWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS EconomicsAWS Meet-up Atlanta: AWS Economics
AWS Meet-up Atlanta: AWS Economics
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost Optimization
 
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
 
Introduction to Database Services
Introduction to Database ServicesIntroduction to Database Services
Introduction to Database Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWSAWS APAC Webinar Series: How to Reduce Your Spend on AWS
AWS APAC Webinar Series: How to Reduce Your Spend on AWS
 
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
AWS re:Invent 2016: Disrupting Big Data with Cost-effective Compute (CMP302)
 
How to Reduce your Spend on AWS
How to Reduce your Spend on AWSHow to Reduce your Spend on AWS
How to Reduce your Spend on AWS
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Newntide latest company Introduction.pdf
Newntide latest company Introduction.pdfNewntide latest company Introduction.pdf
Newntide latest company Introduction.pdf
LucyLuo36
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Newntide latest company Introduction.pdf
Newntide latest company Introduction.pdfNewntide latest company Introduction.pdf
Newntide latest company Introduction.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 

Cloud Connect 2013- Lock Stock and x Smoking EC2's

  • 2. P1) This Presentation is P2) Strongly Inspired by “Guy Ritchie” Movies P3) Disclaimer : All images are downloaded from internet. If you find any of the content / images violating copyright, please let me know and I will act upon it immediately
  • 3. AGENDA • Case • Challenge • Solution • Learning's • About us
  • 4. Case Cigarette smoking is injurious to health • Mobile Advertising company, USA • Forbes 1000 clientele • TB’s of unstructured data -> Big Data Problem
  • 5. Lock • Hourly ~1 TB • CDN Logs • Text Files • XML Files • Geo data files • Server logs • DB records
  • 6. STOCK • Reduce the cost leakage • How to Save $$$ ?
  • 7. Challenges • Daily (was OK), Monthly (Pain) and Historical analysis ( almost dead ) • How do we Transfer, Store, Analyze and Share ? • How to optimize costs at this scale ?
  • 8. Solution Cigarette smoking is injurious to health • Use AWS Cloud for hosting Analytics module • Amazon EMR for unstructured Log Analysis • Automation using Scripts, Java code and other tools
  • 9. Social / 3rd Party Feeds/Cloud Logs Stage 1: Data Transfer • Tsunami UDP • ~1TB un compressed logs every hour • High bandwidth EC2’s for Tsunami UDP • Other Popular Options : • Aspera • AWS Import/Export • WAN optimization • AWS Direct Connect
  • 10. Amazon S3 Logs Stage 2: Storage • Amazon Web Services Building Block – S3 • Scalable Object Store • Inherently Fault Tolerant • ~2 TB of compressed logs every day • S3 RR option for intermediate outputs • Amazon Glacier for archivalSocial / 3rd Party Feeds/Cloud
  • 11. Amazon S3 Elastic MapReduce Logs Stage 3: Analyze • Elastic MapReduce Service of Amazon • Minimal Setup time • Log Analysis • ~2000 mappers / 750 reducers @ peak • ~250 m1.xlarge task nodes (1000 cores, 3750 GB RAM) @ peakSocial / 3rd Party Feeds/Cloud
  • 12. • Amazon EMR is great • But adding Spot EC2 is super cool Wait !!!
  • 13. What is Amazon Spot ? 13 • Time-flexible, interruption-tolerant tasks • Bid Price & Spot Price • M1.xlarge Price Comparison • $0.480 per Hour – On Demand • $0.052 per Hour - Spot • You will never pay more than your maximum bid price per hour •Spot Instance may be interrupted • If interrupted you will not be charged for any partial hour of usage. (*Free)
  • 14. Spot Bidding Strategies 14 •Just above Spot Price •Between Spot Price & On Demand Price •On Demand Price •Above On Demand Price
  • 16. Amazon EMR with Spot Instance Project Master Instance Group Core Instance Group Task Instance Group Long-running clusters on-demand on-demand Spot Cost-driven workloads spot spot Spot Data-critical workloads on-demand on-demand Spot Application testing spot Spot Spot
  • 17. Amazon S3 Elastic MapReduce Social / 3rd Party Feeds Logs Stage 4: Custom EMR Manager • We created a Custom EMR Manager • Choose spot based on: • Past price trend intelligence • Choose AZ based on Current Market Prices • Choose between Large vs Extra Large • Spot Pricing Strategy : • Set Spot Price = On Demand Price • Over board <20% of On Demand Price at times • Dynamic Sizing the Core / Task nodes • Dynamic EMR Cluster creationCustom EMR Manager
  • 18. Some Spot Use Cases 18 • Analytics & Big Data • Scientific computing • Web crawling • Financial model and Analysis • Testing • Image & Media Encoding 66 % savings 50 % savings 57 % savings
  • 19. Learning • Spot + On demand EC2 is a deadly combination for cost savings • Every millisecond matters in MR – Tune your code • Merge Files – Bigger ones are better for processing
  • 20. More Learning … • Custom Job Manager was designed by us • 1 File Per Mapper was better for our case in AWS • Understand the performance constraints of AWS and work with it • Compress data : Both storage and transit(.LZO & Snappy)
  • 21. Continues… • Keep configuration data in local memory or Amazon DynamoDB • Reducers split files suitable for next job mappers • Elasticity – Increase/Decrease Task nodes • Elasticity – Create new EMR Clusters matching the Logs (Core + Task)
  • 22. Value • ~56% cost savings from pure On-Demand model for Core+ Task Nodes • Automation vastly reduced Labor cost ( initial + on going) • Customer CXO’s were happy
  • 23. • AWS Premium Partner • Solution Experts in • Cloud Computing • Big Data • Identity Management About US