SlideShare a Scribd company logo
1 of 13
Download to read offline
How Klout migrated
from CDH3 to CDH4
…and survived to tell about it
Large Scale Production Engineering Meetup
September 19, 2013
Ian Kallen
Lead Engineer, Klout
© 2013 Klout
About Klout
● recognizing & rewarding online influence
● major social network activity signals
● Facebook, Twitter, Google+, LinkedIn, 4sq
● billions data points consumed & processed
● pipelines update scores & topics
● hive & oozie driven jobs & workflow
© 2013 Klout
By The Numbers
● 2 TB data intake, 200 TB processed daily
● jobs clusters x 2 (dev/staging + production)
● hbase x 6 (dev/staging + production x 5)
● hbase: 350M req/day, 17K req/sec peak
● jobs, hbase & zookeeper
total =~ 350 hosts
© 2013 Klout
● pipelines instable, slow on cdh3 (v0.20.2)
● HBase performance predictability
● old hive version limited pipeline developers
● cdh3 EOL’d 6/2013
● cdh4 (v2.0.x) supports NN H/A, impala
● more shiney things
Motivations
© 2013 Klout
The Environment
● data center hosted
● I/O subsystems are under our control
● network latencies are under our control
● FAQ: Why not AWS?
● saved millions of dollars last year
● that's a lot of beer money.
● elasticity need is low, but...
© 2013 Klout
● this is super easy on AWS
● bring up a replacement cluster
● double-write or migrate data to replacement
● tear down old cluster
● have a celebratory drink
● if you have any beer
money left
Cloud Envy
© 2013 Klout
● nagios, pager duty for monitoring
● monit for process watchdogging
● jmx+, graphite, gdash+ for metrics
● ubuntu boot images for provisioning
● puppet for configuration management
● … no Cloudera Manager
Ops Infra
© 2013 Klout
● no replacement infra to migrate to
○ so upgrades must be done in place
● Cloudera's prefers Cloudera Manager
○ so we were on our own to devise a plan
● Cloudera helped vet our plan (thanks!)
● confidence building on dev/staging clusters
● lots of rehearsals on VM's, bug reports
Making Plans
© 2013 Klout
● detailed checklists, kanban board
● small test clusters, the dev clusters
● planned SLA miss for prod cluster upgrade
● lined up phone consult availability
w/Cloudera
○ we needed it about 10 hours into prod jobs cluster
● nobody died
Execution
© 2013 Klout
● jobs run faster (speculative execution?)
● pipelines are faster
● metrics exposed are improved
● HBase clusters lose block locality in transit
○ fixable
● no animals were harmed
in this production
Aftermath
© 2013 Klout
● we had many post-mortems along the way
● lots of engineering time & attention
● sweating the details paid off
● mostly because we’re “power users” of hive
● lessons learned:
○ re-align clusters
○ improve use of vendor tools where possible
■ e.g. Cloudera Manager
Retrospect
© 2013 Klout
● dev/staging + prod clusters x 2
● better use of HDFS paths & job scheduling
● consolidating zookeeper ensembles
● implementing NameNode H/A
● evaluating Cloudera Manager
● evaluating Impala (maybe)
Onward
© 2013 Klout
Klout is hiring awesome people passionate about
optimizing for innovation & stability, crunching big data &
robust systems
If you are a great Hadoop DevOps Engineer
Join Us!
ian@klout.com
Thanks!
Gratuitous Recruiting Slide
© 2013 Klout

More Related Content

What's hot

Introduction to Serverless and Google Cloud Functions
Introduction to Serverless and Google Cloud FunctionsIntroduction to Serverless and Google Cloud Functions
Introduction to Serverless and Google Cloud FunctionsMalepati Bala Siva Sai Akhil
 
From business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflowFrom business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflowDerrick Qin
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentationIlias Okacha
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowPyData
 
Apache Airflow at Dailymotion
Apache Airflow at DailymotionApache Airflow at Dailymotion
Apache Airflow at DailymotionGermain Tanguy
 
Kubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons LearntKubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons LearntArunvel Sriram
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflowmutt_data
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...Itai Yaffe
 
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...InfluxData
 
Nyc kubernetes Meetup - Kubeflow Lightning talk
Nyc kubernetes Meetup - Kubeflow Lightning talkNyc kubernetes Meetup - Kubeflow Lightning talk
Nyc kubernetes Meetup - Kubeflow Lightning talkAdhita Selvaraj
 
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...Kaxil Naik
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowYohei Onishi
 

What's hot (20)

Apache Airflow overview
Apache Airflow overviewApache Airflow overview
Apache Airflow overview
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Introduction to Serverless and Google Cloud Functions
Introduction to Serverless and Google Cloud FunctionsIntroduction to Serverless and Google Cloud Functions
Introduction to Serverless and Google Cloud Functions
 
From business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflowFrom business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflow
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Apache Airflow at Dailymotion
Apache Airflow at DailymotionApache Airflow at Dailymotion
Apache Airflow at Dailymotion
 
Kubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons LearntKubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons Learnt
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Airflow at WePay
Airflow at WePayAirflow at WePay
Airflow at WePay
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
From AWS Data Pipeline to Airflow - managing data pipelines in Nielsen Market...
 
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
Ryan Betts [InfluxData] | InfluxDB Platform Performance | InfluxDays Virtual ...
 
Nyc kubernetes Meetup - Kubeflow Lightning talk
Nyc kubernetes Meetup - Kubeflow Lightning talkNyc kubernetes Meetup - Kubeflow Lightning talk
Nyc kubernetes Meetup - Kubeflow Lightning talk
 
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 

Similar to How Klout migrated from CDH3 to CDH4 …and survived to tell about it

Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CDMulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CDGonzalo Marcos Ansoain
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow ObstructionsTatiana Al-Chueyr
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdfKnoldus Inc.
 
Building a Distributed & Automated Open Source Program at Netflix
Building a Distributed & Automated Open Source Program at NetflixBuilding a Distributed & Automated Open Source Program at Netflix
Building a Distributed & Automated Open Source Program at NetflixAll Things Open
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIURohit Jnagal
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadKarthik Murugesan
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabWebinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabMayaData Inc
 
Head in the clouds @ bol.com
Head in the clouds @ bol.comHead in the clouds @ bol.com
Head in the clouds @ bol.comMaarten Dirkse
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupKaxil Naik
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Noam Elfanbaum
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3LibbySchulze
 
Monitoring with Clickhouse
Monitoring with ClickhouseMonitoring with Clickhouse
Monitoring with Clickhouseunicast
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish styleLars Albertsson
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2Kaxil Naik
 
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...Ravindra Singh
 

Similar to How Klout migrated from CDH3 to CDH4 …and survived to tell about it (20)

Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CDMulesoft Meetup Milano #9 - Batch Processing and CI/CD
Mulesoft Meetup Milano #9 - Batch Processing and CI/CD
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow Obstructions
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdf
 
Building a Distributed & Automated Open Source Program at Netflix
Building a Distributed & Automated Open Source Program at NetflixBuilding a Distributed & Automated Open Source Program at Netflix
Building a Distributed & Automated Open Source Program at Netflix
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabWebinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLab
 
Head in the clouds @ bol.com
Head in the clouds @ bol.comHead in the clouds @ bol.com
Head in the clouds @ bol.com
 
Promise of DevOps
Promise of DevOpsPromise of DevOps
Promise of DevOps
 
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow MeetupWhat's coming in Airflow 2.0? - NYC Apache Airflow Meetup
What's coming in Airflow 2.0? - NYC Apache Airflow Meetup
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
OpenFlow @ Google
OpenFlow @ GoogleOpenFlow @ Google
OpenFlow @ Google
 
Monitoring with Clickhouse
Monitoring with ClickhouseMonitoring with Clickhouse
Monitoring with Clickhouse
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Berlin AWS meetup: here.com on AWS
Berlin AWS meetup: here.com on AWSBerlin AWS meetup: here.com on AWS
Berlin AWS meetup: here.com on AWS
 
Upcoming features in Airflow 2
Upcoming features in Airflow 2Upcoming features in Airflow 2
Upcoming features in Airflow 2
 
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...
Starting a Drupal 8 Project? Let’s do a Technical Discovery - DrupalConAsia 2...
 

Recently uploaded

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 

Recently uploaded (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 

How Klout migrated from CDH3 to CDH4 …and survived to tell about it

  • 1. How Klout migrated from CDH3 to CDH4 …and survived to tell about it Large Scale Production Engineering Meetup September 19, 2013 Ian Kallen Lead Engineer, Klout © 2013 Klout
  • 2. About Klout ● recognizing & rewarding online influence ● major social network activity signals ● Facebook, Twitter, Google+, LinkedIn, 4sq ● billions data points consumed & processed ● pipelines update scores & topics ● hive & oozie driven jobs & workflow © 2013 Klout
  • 3. By The Numbers ● 2 TB data intake, 200 TB processed daily ● jobs clusters x 2 (dev/staging + production) ● hbase x 6 (dev/staging + production x 5) ● hbase: 350M req/day, 17K req/sec peak ● jobs, hbase & zookeeper total =~ 350 hosts © 2013 Klout
  • 4. ● pipelines instable, slow on cdh3 (v0.20.2) ● HBase performance predictability ● old hive version limited pipeline developers ● cdh3 EOL’d 6/2013 ● cdh4 (v2.0.x) supports NN H/A, impala ● more shiney things Motivations © 2013 Klout
  • 5. The Environment ● data center hosted ● I/O subsystems are under our control ● network latencies are under our control ● FAQ: Why not AWS? ● saved millions of dollars last year ● that's a lot of beer money. ● elasticity need is low, but... © 2013 Klout
  • 6. ● this is super easy on AWS ● bring up a replacement cluster ● double-write or migrate data to replacement ● tear down old cluster ● have a celebratory drink ● if you have any beer money left Cloud Envy © 2013 Klout
  • 7. ● nagios, pager duty for monitoring ● monit for process watchdogging ● jmx+, graphite, gdash+ for metrics ● ubuntu boot images for provisioning ● puppet for configuration management ● … no Cloudera Manager Ops Infra © 2013 Klout
  • 8. ● no replacement infra to migrate to ○ so upgrades must be done in place ● Cloudera's prefers Cloudera Manager ○ so we were on our own to devise a plan ● Cloudera helped vet our plan (thanks!) ● confidence building on dev/staging clusters ● lots of rehearsals on VM's, bug reports Making Plans © 2013 Klout
  • 9. ● detailed checklists, kanban board ● small test clusters, the dev clusters ● planned SLA miss for prod cluster upgrade ● lined up phone consult availability w/Cloudera ○ we needed it about 10 hours into prod jobs cluster ● nobody died Execution © 2013 Klout
  • 10. ● jobs run faster (speculative execution?) ● pipelines are faster ● metrics exposed are improved ● HBase clusters lose block locality in transit ○ fixable ● no animals were harmed in this production Aftermath © 2013 Klout
  • 11. ● we had many post-mortems along the way ● lots of engineering time & attention ● sweating the details paid off ● mostly because we’re “power users” of hive ● lessons learned: ○ re-align clusters ○ improve use of vendor tools where possible ■ e.g. Cloudera Manager Retrospect © 2013 Klout
  • 12. ● dev/staging + prod clusters x 2 ● better use of HDFS paths & job scheduling ● consolidating zookeeper ensembles ● implementing NameNode H/A ● evaluating Cloudera Manager ● evaluating Impala (maybe) Onward © 2013 Klout
  • 13. Klout is hiring awesome people passionate about optimizing for innovation & stability, crunching big data & robust systems If you are a great Hadoop DevOps Engineer Join Us! ian@klout.com Thanks! Gratuitous Recruiting Slide © 2013 Klout