SlideShare a Scribd company logo
1 of 106
Download to read offline
Metrics-Driven Engineering

Mike Brittain        @ mikebrittain
Director of engineering, Infrastructure

                                          October 13, 2011
Tools and Process at Etsy
How many new visits?
  How many listings created?
  How many registrations?
How do people use Etsy?
  How many convos sent?
    How many purchases?
     How many new shops?
Search indexing?
     How fast are pages generating?
   Async tasks currently in queue?
What is the application doing?
 Developer API auth and rate limiting?
       Images resized and stored?
          Error and warning rates?
Replication slave lag?
       Memcache hits/misses?
       Available connections?
Are the servers in good shape ?
    Database queries per second?
       Total outgoing bandwidth?
            CPU, Memory, I/O?
Business Metrics
Application Metrics
System Metrics
Visibility EVERYWHERE
Constant Change
$314 Million GMS 2010
  $180 Million GMS 2009
  $87 Million GMS 2008

  $26 Million GMS 2007




credit: pentarux (flickr)
25 Million Unique Visitors
  1 Billion page views per month




credit: pentarux (flickr)
Engineering team grew 500%
                        over 18 months


credit: martin_heigan (flickr)
Less talk, more do.
Always Be Shipping



credit: ibailemon (flickr)
Always Be Shipping
                             (even if it’s your first day)




credit: ibailemon (flickr)
90+ Engineers
                     40+ Deploys / day

credit: misswired (flickr)
credit: digidave (flickr)
Code Reviews
Automated Tests
$cfg = array(
   'checkout' => array('enabled' => 'on'),
   'homepage' => array('enabled' => 'on'),
   'profiles' => array('enabled' => 'on'),
   'new_search' => array('enabled' => 'off'),
);


                          Config Flags
Enable and disable features quickly
$cfg = array(
   'checkout' => array('enabled' => 'on'),
   'homepage' => array('enabled' => 'on'),
   'profiles' => array('enabled' => 'on'),
   'new_search' => array('enabled' => 'off'),
);


                          Config Flags
Enable and disable features quickly
Plus “admin-only,” percentage ramp-up, A/B testing,
whitelists, blacklists, etc...
Failure is not an option
inevitable!
Failure is not an option
inevitable!
Failure is not an option
            a learning opportunity!
inevitable!
Failure is not an option
            a learning opportunity!
     DETECTABLE!
Access
Detect problems quickly
CONFIDENCE
A:    Well, the Ops team manages the network, racks
     the servers, installed the monitoring tools, wears
                the pagers, blah, blah, blah...
Engineers build the application
Logging
      Graphing
OPS              ENG
      Trending
      Alerting
“Engineers are too busy writing
  features to build metrics.”
Metrics are part of every feature
        ...and so are config flags
Dead Simple
Simple, open source tools
Cacti (network, SNMP)
Ganglia (machines)
Graphite (application)
Splunk (log analysis, nightly reports)
Nagios (alerting)
                             Logging
                             Logster
                               StatsD
Ganglia
Ganglia
Cluster-oriented
Huge community contributed recipes
Custom metrics (gmetad)
Graphite
Graphite
                            Single-instance
              Create new metrics on-the-fly
   Customize via URLs and display functions
Logging
It’s 2:48 PM.
Do you know where your
       logs are?
Logger::log_error("User login failed.
Reason: $msg for $username", “login”);
Logger::log_error("User login failed.
Reason: $msg for $username", “login”);
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
LogFormat "%h %l %u %t "%r" %>s %b"
                common
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
apache_note()
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
grep "/listing/" access.log | 
awk '{sum=sum+$(NF-2)} END {print sum/NR}'
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Help me, Rhonda.
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0201   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0034   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web1101   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0201   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
web0055   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
web0002   [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling.
web0089   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0020   [04:28:54   2011]   [error] [client 10.101.x.x] Sky is falling.
web1101   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0055   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0034   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0087   [04:28:54   2011]   [fatal] [client 10.101.x.x] Sky is falling.
web0002   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0201   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0077   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0355   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0052   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0003   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
web0066   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
Logster
Fatals       Errors   Warnings
Logster
Run by cron
Keeps a cursor on your log file
Aggregate lines anyway you want
Output to Ganglia or Graphite
Simple parsers
                                  github.com/etsy
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
^.+ [.+] [(?P<log_level>.+)]
if (fields['log_level'] == “fatal”):
   self.fatals += 1

elif (fields['log_level'] == “error”):
   self.errors += 1

elif (fields['log_level'] == “warning”):
   self.warnings += 1

...
MetricObject("fatals",
  (self.fatals / self.duration), "per sec")

MetricObject("errors",
  (self.errors / self.duration), "per sec")

MetricObject("warning",
  (self.warnings / self.duration), "per sec")
Fatals   Errors   Warnings
StatsD
StatsD
                           Network daemon (node.js)
                               Accepts data over UDP
                      Flushes to Graphite every 10 sec
                                     One-line of code
github.com/etsy
StatsD::increment("logins.success");
StatsD::increment("logins.success");




                                  logins
StatsD::timing("gearman.time", $msec);
StatsD::timing("gearman.time", $msec);



                                 90th pct

                                 average

                                 lower
Ad hoc
name value timestamp
echo "events.deploy.site 1 `date +%s`" 
     | nc graphite.etsycorp.com 2003
Vertical Line Technology!
target=drawAsInfinite(events.deploy.site)
We could stare at graphs all day...
http://graphite/render?
   from=-1hours&width=600&height=200
&target=webs.errorLog.warning&rawData=1
http://graphite/render?
       from=-1hours&width=600&height=200
    &target=webs.errorLog.warning&rawData=1

webs.errorLog.warning,1318444930,1318448530,60|
5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0,
1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0,
1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5.
0,1.0,1.0,None
Holt-Winters Confidence Bands

upper

         lower
Holt-Winters Aberration
Business metrics
 + Confidence bands
_____________
    Alertable metrics
40,000+ metrics at Etsy
  Systems, Applications, Business
Dashboards
Dashboards
Kind of Hard :-/
<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or
+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
     <img src="http://graphite.etsycorp.com/render?
from=-1hours&width=280&height=220&title=File+or+Script+Not
+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
</a>
Super Easy!
$g = new Graphite($time);
$g->setTitle('File Not Found');
$g->addMetric('webs.errorLog.notExist', '#00cc00');
echo $g->getDashboardHTML(280, 220);
Metrics!
Metrics!
Metrics + Events
Metrics!
Metrics + Events
Metrics + Alerts
Metrics!
Metrics + Events
Metrics + Alerts
Metrics + Metrics
High-level, real-time visibility
Detect problems quickly
CONFIDENCE
Make them required features
Make them dead simple
Make them accessible
Make them!
Homework
codeascraft.etsy.com
github.com/etsy                      Get in touch
                                     mike @ etsy . com
We’re always looking for people         @ mikebrittain
who are interested in this kind of
stuff...



Thank You
etsy.com/careers
Metrics-Driven Engineering

More Related Content

What's hot

Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Iakiv Kramarenko
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Michelangelo van Dam
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Codemotion
 
Testing ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETTesting ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETBen Hall
 
A Journey with React
A Journey with ReactA Journey with React
A Journey with ReactFITC
 
Good karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaGood karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaExoLeaders.com
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - ENIakiv Kramarenko
 
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...apidays
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moondavejohnson
 
Maintainable JavaScript 2012
Maintainable JavaScript 2012Maintainable JavaScript 2012
Maintainable JavaScript 2012Nicholas Zakas
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybaraIakiv Kramarenko
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to comePablo Enfedaque
 
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...apidays
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitPeter Wilcsinszky
 
Pragmatics of Declarative Ajax
Pragmatics of Declarative AjaxPragmatics of Declarative Ajax
Pragmatics of Declarative Ajaxdavejohnson
 
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsJavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsSteve Wallin
 
Ditching JQuery
Ditching JQueryDitching JQuery
Ditching JQueryhowlowck
 
Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Stephan Hochdörfer
 
Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Stephan Hochdörfer
 

What's hot (20)

Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
 
Testing ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETTesting ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NET
 
A Journey with React
A Journey with ReactA Journey with React
A Journey with React
 
Good karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaGood karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with Karma
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - EN
 
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moon
 
KISS Automation.py
KISS Automation.pyKISS Automation.py
KISS Automation.py
 
Maintainable JavaScript 2012
Maintainable JavaScript 2012Maintainable JavaScript 2012
Maintainable JavaScript 2012
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybara
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to come
 
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnit
 
Pragmatics of Declarative Ajax
Pragmatics of Declarative AjaxPragmatics of Declarative Ajax
Pragmatics of Declarative Ajax
 
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsJavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
 
Ditching JQuery
Ditching JQueryDitching JQuery
Ditching JQuery
 
Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13
 
Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010
 

Viewers also liked

How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNHow to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNMike Brittain
 
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCContinuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCMike Brittain
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMatt Graham
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsMike Brittain
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and TrendingMike Brittain
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and ResilienceMike Brittain
 
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackA Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackDaniel Schauenberg
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsMike Brittain
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsFrom Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsMike Brittain
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightRoss Snyder
 
The Real Life Social Network v2
The Real Life Social Network v2The Real Life Social Network v2
The Real Life Social Network v2Paul Adams
 
Docker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker, Inc.
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyMike Brittain
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018Brian Solis
 

Viewers also liked (15)

Scaling Deployment at Etsy
Scaling Deployment at EtsyScaling Deployment at Etsy
Scaling Deployment at Etsy
 
How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNHow to Get to Second Base with Your CDN
How to Get to Second Base with Your CDN
 
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCContinuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYC
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and Trending
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and Resilience
 
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackA Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsFrom Building a Marketplace to Building Teams
From Building a Marketplace to Building Teams
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
 
The Real Life Social Network v2
The Real Life Social Network v2The Real Life Social Network v2
The Real Life Social Network v2
 
Docker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EE
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018
 

Similar to Metrics-Driven Engineering

Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logsStefan Krawczyk
 
Data-Driven Software Design
Data-Driven Software DesignData-Driven Software Design
Data-Driven Software DesignPatrick McKenzie
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Chris Alfano
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTPMustafa TURAN
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wyciekówKonrad Kokosa
 
Open Source Ajax Solution @OSDC.tw 2009
Open Source Ajax  Solution @OSDC.tw 2009Open Source Ajax  Solution @OSDC.tw 2009
Open Source Ajax Solution @OSDC.tw 2009Robbie Cheng
 
idea: talk about the Active Cache
idea: talk about the Active Cacheidea: talk about the Active Cache
idea: talk about the Active CacheChing Yi Chan
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Librariesjeresig
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton
 
Google Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and BeyondGoogle Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and Beyonddion
 
Implementation of GUI Framework part3
Implementation of GUI Framework part3Implementation of GUI Framework part3
Implementation of GUI Framework part3masahiroookubo
 
Preparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationPreparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationBrian Hogg
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverFastly
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"LogeekNightUkraine
 
Introduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointIntroduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointGeoff Varosky
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Neo4j
 
Brian hogg word camp preparing a plugin for translation
Brian hogg   word camp preparing a plugin for translationBrian hogg   word camp preparing a plugin for translation
Brian hogg word camp preparing a plugin for translationwcto2017
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...Fwdays
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data cultureSimon Dittlmann
 

Similar to Metrics-Driven Engineering (20)

Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
Data-Driven Software Design
Data-Driven Software DesignData-Driven Software Design
Data-Driven Software Design
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTP
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
Open Source Ajax Solution @OSDC.tw 2009
Open Source Ajax  Solution @OSDC.tw 2009Open Source Ajax  Solution @OSDC.tw 2009
Open Source Ajax Solution @OSDC.tw 2009
 
idea: talk about the Active Cache
idea: talk about the Active Cacheidea: talk about the Active Cache
idea: talk about the Active Cache
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Libraries
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Google Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and BeyondGoogle Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and Beyond
 
Implementation of GUI Framework part3
Implementation of GUI Framework part3Implementation of GUI Framework part3
Implementation of GUI Framework part3
 
Preparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationPreparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for Translation
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
 
Introduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointIntroduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePoint
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture
 
Brian hogg word camp preparing a plugin for translation
Brian hogg   word camp preparing a plugin for translationBrian hogg   word camp preparing a plugin for translation
Brian hogg word camp preparing a plugin for translation
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
 

Recently uploaded

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Recently uploaded (20)

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Metrics-Driven Engineering

  • 1. Metrics-Driven Engineering Mike Brittain @ mikebrittain Director of engineering, Infrastructure October 13, 2011
  • 3. How many new visits? How many listings created? How many registrations? How do people use Etsy? How many convos sent? How many purchases? How many new shops?
  • 4. Search indexing? How fast are pages generating? Async tasks currently in queue? What is the application doing? Developer API auth and rate limiting? Images resized and stored? Error and warning rates?
  • 5. Replication slave lag? Memcache hits/misses? Available connections? Are the servers in good shape ? Database queries per second? Total outgoing bandwidth? CPU, Memory, I/O?
  • 11.
  • 12. $314 Million GMS 2010 $180 Million GMS 2009 $87 Million GMS 2008 $26 Million GMS 2007 credit: pentarux (flickr)
  • 13. 25 Million Unique Visitors 1 Billion page views per month credit: pentarux (flickr)
  • 14. Engineering team grew 500% over 18 months credit: martin_heigan (flickr)
  • 16. Always Be Shipping credit: ibailemon (flickr)
  • 17. Always Be Shipping (even if it’s your first day) credit: ibailemon (flickr)
  • 18.
  • 19. 90+ Engineers 40+ Deploys / day credit: misswired (flickr)
  • 23. $cfg = array( 'checkout' => array('enabled' => 'on'), 'homepage' => array('enabled' => 'on'), 'profiles' => array('enabled' => 'on'), 'new_search' => array('enabled' => 'off'), ); Config Flags Enable and disable features quickly
  • 24. $cfg = array( 'checkout' => array('enabled' => 'on'), 'homepage' => array('enabled' => 'on'), 'profiles' => array('enabled' => 'on'), 'new_search' => array('enabled' => 'off'), ); Config Flags Enable and disable features quickly Plus “admin-only,” percentage ramp-up, A/B testing, whitelists, blacklists, etc...
  • 25. Failure is not an option
  • 27. inevitable! Failure is not an option a learning opportunity!
  • 28. inevitable! Failure is not an option a learning opportunity! DETECTABLE!
  • 30.
  • 31.
  • 32.
  • 35.
  • 36. A: Well, the Ops team manages the network, racks the servers, installed the monitoring tools, wears the pagers, blah, blah, blah...
  • 37. Engineers build the application
  • 38. Logging Graphing OPS ENG Trending Alerting
  • 39. “Engineers are too busy writing features to build metrics.”
  • 40. Metrics are part of every feature ...and so are config flags
  • 43. Cacti (network, SNMP) Ganglia (machines) Graphite (application) Splunk (log analysis, nightly reports) Nagios (alerting) Logging Logster StatsD
  • 45. Ganglia Cluster-oriented Huge community contributed recipes Custom metrics (gmetad)
  • 47. Graphite Single-instance Create new metrics on-the-fly Customize via URLs and display functions
  • 49. It’s 2:48 PM. Do you know where your logs are?
  • 50. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  • 51. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  • 52. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 53. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 54. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 55. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 56. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 57. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 58. LogFormat "%h %l %u %t "%r" %>s %b" common
  • 59. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 61. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 62. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 63. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 64. grep "/listing/" access.log | awk '{sum=sum+$(NF-2)} END {print sum/NR}'
  • 65. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
  • 66. Logster Fatals Errors Warnings
  • 67. Logster Run by cron Keeps a cursor on your log file Aggregate lines anyway you want Output to Ganglia or Graphite Simple parsers github.com/etsy
  • 68. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 70. if (fields['log_level'] == “fatal”): self.fatals += 1 elif (fields['log_level'] == “error”): self.errors += 1 elif (fields['log_level'] == “warning”): self.warnings += 1 ...
  • 71. MetricObject("fatals", (self.fatals / self.duration), "per sec") MetricObject("errors", (self.errors / self.duration), "per sec") MetricObject("warning", (self.warnings / self.duration), "per sec")
  • 72. Fatals Errors Warnings
  • 74. StatsD Network daemon (node.js) Accepts data over UDP Flushes to Graphite every 10 sec One-line of code github.com/etsy
  • 78. StatsD::timing("gearman.time", $msec); 90th pct average lower
  • 79. Ad hoc name value timestamp
  • 80. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003
  • 82.
  • 83. We could stare at graphs all day...
  • 84. http://graphite/render? from=-1hours&width=600&height=200 &target=webs.errorLog.warning&rawData=1
  • 85. http://graphite/render? from=-1hours&width=600&height=200 &target=webs.errorLog.warning&rawData=1 webs.errorLog.warning,1318444930,1318448530,60| 5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0, 1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0, 1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5. 0,1.0,1.0,None
  • 88. Business metrics + Confidence bands _____________ Alertable metrics
  • 89. 40,000+ metrics at Etsy Systems, Applications, Business
  • 92. Kind of Hard :-/ <a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or +Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render? from=-1hours&width=280&height=220&title=File+or+Script+Not +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> </a>
  • 93. Super Easy! $g = new Graphite($time); $g->setTitle('File Not Found'); $g->addMetric('webs.errorLog.notExist', '#00cc00'); echo $g->getDashboardHTML(280, 220);
  • 97. Metrics! Metrics + Events Metrics + Alerts Metrics + Metrics
  • 101. Make them required features
  • 102. Make them dead simple
  • 105. Homework codeascraft.etsy.com github.com/etsy Get in touch mike @ etsy . com We’re always looking for people @ mikebrittain who are interested in this kind of stuff... Thank You etsy.com/careers