SlideShare a Scribd company logo
1 of 41
Download to read offline
2013-10-18 
MONITOR 
SOME OF THE THINGS
Optimization, Backups, 
Replication, and more 
3rd Edition 
Covers Version 5.5 
High 
Performance 
MySQL 
Baron Schwartz, 
Peter Zaitsev & 
Vadim Tkachenko 
ME 
• Cofounder of @VividCortex 
• Author of High Performance MySQL 
• @xaprb on Twitter 
• baron@vividcortex.com 
• http://www.linkedin.com/in/xaprb
RANT, RECAPPED 
• The sky is falling 
• Tools drive processes, and we need better tools designed for methods 
• Pay attention to CAPS (Capacity, Availability, Performance, Scalability) 
• Monitoring tools need to be a lot smarter 
• Measure and monitor “work getting done”
HARD CAPACITY 
• Disk volume 
• CPU Cycles 
• max_connections 
• File descriptors, sockets, TCP port 
numbers, etc 
• %used, absolute quantity available
SOFT CAPACITY 
• Neil Gunther’s Universal Scalability 
Law 
• %used, absolute quantity available 
• Throughput, concurrency, errors
AVAILABILITY 
• Availability is absence of downtime • %used, absolute quantity available 
• Throughput, concurrency, errors 
• MTBF, MTTR, MTTD, %availability
TASK PERFORMANCE 
• Task performance is consistently fast 
response time. 
• Measure an SLA in percentile 
response time per task, over 
observation intervals 
• %used, absolute quantity available 
• Throughput, concurrency, errors 
• MTBF, MTTR, MTTD, %availability 
• Response time, 95% response time
RESOURCE PERFORMANCE 
• Resource performance is ability to run 
tasks consistently fast. 
• %used, absolute quantity available 
• Throughput, concurrency, errors 
• MTBF, MTTR, MTTD, %availability 
• Response time, 95% response time 
• Throughput, concurrency, busy time, 
total response time, backlog/queue
SCALABILITY 
• Universal Scalability Law again • %used, absolute quantity available 
• Throughput, concurrency, errors 
• MTBF, MTTR, MTTD, %availability 
• Response time, 95% response time 
• Throughput, concurrency, busy time, 
total response time, backlog/queue
STALL DETECTION 
• Overloaded or underperforming? • %used, absolute quantity available 
• Throughput, concurrency, errors 
• MTBF, MTTR, MTTD, %availability 
• Response time, 95% response time 
• Throughput, concurrency, busy time, 
total response time, backlog/queue 
• Utilization, saturation, errors, sources 
of load/demand
GIT ‘ER DONE 
MONITOR WORK AND 
RESOURCES
WHAT NOT TO DO 
• Don’t use top-N lists from Google 
• Don’t just do what’s included in some 
Nagios plugin
№1 
TOP 10 LIST 
1. MySQL availability 
2. Presence of insecure users and databases 
3. Aborted connects 
4. Error log 
5. Deadlocks 
6. Change in server configuration 
7. Slow query log 
8. Slave lag 
9. Percentage of maximum allowed connections 
10. Percentage of full table scans
№2 
TOP 10 LIST 
1. Threads_connected 
2. Created_tmp_disk_tables 
3. Handler_read_first 
4. Innodb_buffer_pool_wait_free 
5. Key_reads 
6. Max_used_connections 
7. Open_tables 
8. Select_full_join 
9. Slow_queries 
10. Uptime
№1 
PLUGIN 
1. threadcache-hitrate (Hit rate of the thread-cache) 
2. slave-io-running (Slave io running: Yes) 
3. slave-sql-running (Slave sql running: Yes) 
4. qcache-hitrate (Query cache hitrate) 
5. qcache-lowmem-prunes (Query cache entries pruned because of low memory) 
6. keycache-hitrate (MyISAM key cache hitrate) 
7. bufferpool-hitrate (InnoDB buffer pool hitrate) 
8. bufferpool-wait-free (InnoDB buffer pool waits for clean page available) 
9. log-waits (InnoDB log waits because of a too small log buffer) 
10. tablecache-hitrate (Table cache hitrate) 
11. table-lock-contention (Table lock contention) 
12. index-usage (Usage of indices) 
13. tmp-disk-tables (Percent of temp tables created on disk) 
14. long-running-procs (long running processes)
№2 
PLUGIN 
1. connection-time 
2. uptime 
3. threads-connected 
4. threadcache-hitrate 
5. q[uery]cache-hitrate 
6. q[uery]cache-lowmem-prunes 
7. [myisam-]keycache-hitrate 
8. [innodb-]bufferpool-hitrate 
9. [innodb-]bufferpool-wait-free 
10. [innodb-]log-waits 
11. tablecache-hitrate 
12. table-lock-contention 
13. index-usage 
14. tmp-disk-tables 
15. slow-queries 
16. long-running-procs 
17. slave-lag 
18. slave-io-running 
19. slave-sql-running 
20. sql 
21. open-files 
22. encode 
23. cluster-ndb-running
№3 
PLUGIN
HTTP://WWW.FLICKR.COM/PHOTOS/NASAMARSHALL/5926864640/ 
SURFACE AREA
DUPLICATE SIGNALS 
• Queries 
• Com_admin_commands 
• Com_assign_to_keycache 
• Com_alter_db 
• Com_alter_db_upgrade 
• Com_alter_event 
• Com_alter_function 
• Com_alter_procedure 
• Com_alter_server 
• Com_alter_table 
• Com_alter_tablespace 
• Com_alter_user 
• Com_analyze 
• Com_begin 
• Com_binlog 
• Com_ad_nauseum
DESIRABLE METRICS 
• %used, absolute quantity available 
• Throughput, concurrency, errors 
• MTBF, MTTR, MTTD, %availability 
• Response time, 95% response time 
• Throughput, concurrency, busy time, total response time, backlog/queue 
• Utilization, saturation, errors, sources of load/demand
Desirable Easy
Desirable Easy
IRRELEVANT 
EXAMPLE PLEASE?
RESOURCE LIMITS 
• Threads_connected near max_connections? 
• %table cache used? 
• Open file handles? 
• Long-running queries/transactions?
ERRORS 
• Deadlocks? 
• Aborted connects?
AVAILABILITY 
• Ability to connect and run a query? 
• Uptime is small? 
• Replication is running?
PERFORMANCE 
• You can get throughput (Queries) and concurrency (Threads_running) from MySQL 
• But in a Nagios check, no context to know whether they’re good or bad 
• You generally can’t get response time, busy time, utilization, backlog, etc 
• You can aggregate thread states, thread times, users, databases, query abstracts...
NAGIOS IS BEST AT 
LIVING IN THE 
MOMENT
THOU SHALT NOT 
• Cache hit ratios 
• Thread cache hit ratio 
• Buffer pool cache hit ratio 
• Table cache hit ratio 
• Key cache hit ratio 
• Query cache hit ratio 
• Rates of “bad” queries 
• % temp tables on disk 
• % full table scans 
• % slow queries 
• Unfixable things 
• Replication delay
WHY NOT? 
• Those are properties of the workload and application 
• They are not conditions to alert/warn about 
• They are not fixable / actionable in the service
ALERTS ARE 
BETTER TOGETHER
QUESTION: 
WHAT IS BETTER?
№1 ALERT!!!!! 
Disk CRIT 100% /dev/sda2
№2 ALERT!!!!! 
Replication CRIT Slave I/O Thread No
№3 ALERT!!!!! 
Replication CRIT Slave SQL Thread No
№4 ALERT!!!!! 
Replication CRIT Seconds_Behind_Master NULL
№5 ALERT!!!!! 
MySQL CRIT oldest transaction: 86400 seconds
- OR -
№1 ALERT!!!!! 
CRIT 
* Disk /dev/sda2 full 
* Replication stopped 
* Oldest transaction 86400 seconds 
* 4999 threads in status “Waiting for table metadata lock”
HOLLER AT ME 
QUESTIONS? 
@XAPRB / BARON@VIVIDCORTEX.COM
RESOURCES 
• Chapter 3 of High Performance MySQL, 3rd Edition 
• Percona White Papers 
• Causes of Downtime in Production MySQL Servers 
• Preventing MySQL Emergencies 
• Goal-Driven Performance Optimization 
• Forecasting MySQL Scalability with the Universal Scalability Law 
• Method R: Optimizing Oracle Performance, Cary Millsap 
• The Goal, Eli Goldratt 
• The USE Method (Brendan Gregg) & his new book 
• Guerrilla Capacity Planning, Neil J. Gunther 
• Fundamental Performance & Scalability Instrumentation

More Related Content

What's hot

5 things you didn't know nginx could do velocity
5 things you didn't know nginx could do   velocity5 things you didn't know nginx could do   velocity
5 things you didn't know nginx could do velocitysarahnovotny
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafkaconfluent
 
Nginx - Tips and Tricks.
Nginx - Tips and Tricks.Nginx - Tips and Tricks.
Nginx - Tips and Tricks.Harish S
 
Puppet Development Workflow
Puppet Development WorkflowPuppet Development Workflow
Puppet Development WorkflowJeffery Smith
 
Steamlining your puppet development workflow
Steamlining your puppet development workflowSteamlining your puppet development workflow
Steamlining your puppet development workflowTomas Doran
 
NGINX 101 - now with more Docker
NGINX 101 - now with more DockerNGINX 101 - now with more Docker
NGINX 101 - now with more DockerSarah Novotny
 
SaltConf14 - Oz Akan, Rackspace - Deploying OpenStack Marconi with SaltStack
SaltConf14 - Oz Akan, Rackspace - Deploying OpenStack Marconi with SaltStackSaltConf14 - Oz Akan, Rackspace - Deploying OpenStack Marconi with SaltStack
SaltConf14 - Oz Akan, Rackspace - Deploying OpenStack Marconi with SaltStackSaltStack
 
Load Balancing with Nginx
Load Balancing with NginxLoad Balancing with Nginx
Load Balancing with NginxMarian Marinov
 
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenSteve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenPostgresOpen
 
under the covers -- chef in 20 minutes or less
under the covers -- chef in 20 minutes or lessunder the covers -- chef in 20 minutes or less
under the covers -- chef in 20 minutes or lesssarahnovotny
 
London devops logging
London devops loggingLondon devops logging
London devops loggingTomas Doran
 
Integrated Cache on Netscaler
Integrated Cache on NetscalerIntegrated Cache on Netscaler
Integrated Cache on NetscalerMark Hillick
 
Extending functionality in nginx, with modules!
Extending functionality in nginx, with modules!Extending functionality in nginx, with modules!
Extending functionality in nginx, with modules!Trygve Vea
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationSean Chittenden
 
How To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - SlidesSeveralnines
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014Amazon Web Services
 
Load Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesLoad Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesSeveralnines
 
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015Fastly
 

What's hot (20)

5 things you didn't know nginx could do velocity
5 things you didn't know nginx could do   velocity5 things you didn't know nginx could do   velocity
5 things you didn't know nginx could do velocity
 
How to Fail at Kafka
How to Fail at KafkaHow to Fail at Kafka
How to Fail at Kafka
 
Nginx - Tips and Tricks.
Nginx - Tips and Tricks.Nginx - Tips and Tricks.
Nginx - Tips and Tricks.
 
Redis acl
Redis aclRedis acl
Redis acl
 
Puppet Development Workflow
Puppet Development WorkflowPuppet Development Workflow
Puppet Development Workflow
 
Steamlining your puppet development workflow
Steamlining your puppet development workflowSteamlining your puppet development workflow
Steamlining your puppet development workflow
 
NGINX 101 - now with more Docker
NGINX 101 - now with more DockerNGINX 101 - now with more Docker
NGINX 101 - now with more Docker
 
SaltConf14 - Oz Akan, Rackspace - Deploying OpenStack Marconi with SaltStack
SaltConf14 - Oz Akan, Rackspace - Deploying OpenStack Marconi with SaltStackSaltConf14 - Oz Akan, Rackspace - Deploying OpenStack Marconi with SaltStack
SaltConf14 - Oz Akan, Rackspace - Deploying OpenStack Marconi with SaltStack
 
Load Balancing with Nginx
Load Balancing with NginxLoad Balancing with Nginx
Load Balancing with Nginx
 
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenSteve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
 
under the covers -- chef in 20 minutes or less
under the covers -- chef in 20 minutes or lessunder the covers -- chef in 20 minutes or less
under the covers -- chef in 20 minutes or less
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
Integrated Cache on Netscaler
Integrated Cache on NetscalerIntegrated Cache on Netscaler
Integrated Cache on Netscaler
 
Sensu
SensuSensu
Sensu
 
Extending functionality in nginx, with modules!
Extending functionality in nginx, with modules!Extending functionality in nginx, with modules!
Extending functionality in nginx, with modules!
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
 
How To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - SlidesHow To Set Up SQL Load Balancing with HAProxy - Slides
How To Set Up SQL Load Balancing with HAProxy - Slides
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
Load Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - SlidesLoad Balancing MySQL with HAProxy - Slides
Load Balancing MySQL with HAProxy - Slides
 
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
 

Viewers also liked

The five most common causes of rejection of the Brazilian work permit procedure
The five most common causes of rejection of the Brazilian work permit procedureThe five most common causes of rejection of the Brazilian work permit procedure
The five most common causes of rejection of the Brazilian work permit procedureRui da Fonseca e Castro
 
Supporting Student Success: UDL and Your Library
Supporting Student Success: UDL and Your LibrarySupporting Student Success: UDL and Your Library
Supporting Student Success: UDL and Your LibraryTowson University
 
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8Manuela Pestana
 
Work Visa for Brazil, Brief Description of the Procedure
Work Visa for Brazil, Brief Description of the ProcedureWork Visa for Brazil, Brief Description of the Procedure
Work Visa for Brazil, Brief Description of the ProcedureRui da Fonseca e Castro
 
Maximo Performance - A Best Practice Overview Webinar, August 27, 2014
Maximo Performance - A Best Practice Overview Webinar, August 27, 2014Maximo Performance - A Best Practice Overview Webinar, August 27, 2014
Maximo Performance - A Best Practice Overview Webinar, August 27, 2014Reflective Solutions
 
I am a person of ...
I am a person of ...I am a person of ...
I am a person of ...ms451711
 
Akif instraction
Akif instractionAkif instraction
Akif instractionAkif Durna
 
Making big data small
Making big data smallMaking big data small
Making big data smallandertech
 
Ch7 delivering speeches (modes of delivery)
Ch7 delivering speeches (modes of delivery)Ch7 delivering speeches (modes of delivery)
Ch7 delivering speeches (modes of delivery)ms451711
 
Chapter 2 3
Chapter 2 3Chapter 2 3
Chapter 2 3ms451711
 

Viewers also liked (16)

Individual fucas
Individual fucasIndividual fucas
Individual fucas
 
REED 729 Seminar in Reading
REED 729 Seminar in ReadingREED 729 Seminar in Reading
REED 729 Seminar in Reading
 
Umesh
UmeshUmesh
Umesh
 
The five most common causes of rejection of the Brazilian work permit procedure
The five most common causes of rejection of the Brazilian work permit procedureThe five most common causes of rejection of the Brazilian work permit procedure
The five most common causes of rejection of the Brazilian work permit procedure
 
Supporting Student Success: UDL and Your Library
Supporting Student Success: UDL and Your LibrarySupporting Student Success: UDL and Your Library
Supporting Student Success: UDL and Your Library
 
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
You are-all-crazy-subjectivaly-speaking-uploaded-1224441527362216-8
 
Work Visa for Brazil, Brief Description of the Procedure
Work Visa for Brazil, Brief Description of the ProcedureWork Visa for Brazil, Brief Description of the Procedure
Work Visa for Brazil, Brief Description of the Procedure
 
Maximo Performance - A Best Practice Overview Webinar, August 27, 2014
Maximo Performance - A Best Practice Overview Webinar, August 27, 2014Maximo Performance - A Best Practice Overview Webinar, August 27, 2014
Maximo Performance - A Best Practice Overview Webinar, August 27, 2014
 
Bài Tập Hóa
Bài Tập HóaBài Tập Hóa
Bài Tập Hóa
 
Dairy industry
Dairy industryDairy industry
Dairy industry
 
I am a person of ...
I am a person of ...I am a person of ...
I am a person of ...
 
Akif instraction
Akif instractionAkif instraction
Akif instraction
 
Making big data small
Making big data smallMaking big data small
Making big data small
 
Ch12 pp
Ch12 ppCh12 pp
Ch12 pp
 
Ch7 delivering speeches (modes of delivery)
Ch7 delivering speeches (modes of delivery)Ch7 delivering speeches (modes of delivery)
Ch7 delivering speeches (modes of delivery)
 
Chapter 2 3
Chapter 2 3Chapter 2 3
Chapter 2 3
 

Similar to Monitor some of the things

Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)srisatish ambati
 
KoprowskiT - SQLBITS X - 2am a disaster just began
KoprowskiT - SQLBITS X - 2am a disaster just beganKoprowskiT - SQLBITS X - 2am a disaster just began
KoprowskiT - SQLBITS X - 2am a disaster just beganTobias Koprowski
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
 
Right-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual MachineRight-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual Machineheraflux
 
MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014Ryusuke Kajiyama
 
Ensuring Consistency in a Replicated World
Ensuring Consistency in a Replicated WorldEnsuring Consistency in a Replicated World
Ensuring Consistency in a Replicated WorldYelp Engineering
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Jon Haddad
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Tingjaxconf
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 

Similar to Monitor some of the things (20)

Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)
 
KoprowskiT - SQLBITS X - 2am a disaster just began
KoprowskiT - SQLBITS X - 2am a disaster just beganKoprowskiT - SQLBITS X - 2am a disaster just began
KoprowskiT - SQLBITS X - 2am a disaster just began
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
Alfresco tuning part1
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1
 
Alfresco tuning part1
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1
 
Right-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual MachineRight-Sizing your SQL Server Virtual Machine
Right-Sizing your SQL Server Virtual Machine
 
MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014MySQL Performance Tuning at COSCUP 2014
MySQL Performance Tuning at COSCUP 2014
 
Ensuring Consistency in a Replicated World
Ensuring Consistency in a Replicated WorldEnsuring Consistency in a Replicated World
Ensuring Consistency in a Replicated World
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Ting
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 

Recently uploaded

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 

Recently uploaded (20)

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 

Monitor some of the things

  • 1. 2013-10-18 MONITOR SOME OF THE THINGS
  • 2. Optimization, Backups, Replication, and more 3rd Edition Covers Version 5.5 High Performance MySQL Baron Schwartz, Peter Zaitsev & Vadim Tkachenko ME • Cofounder of @VividCortex • Author of High Performance MySQL • @xaprb on Twitter • baron@vividcortex.com • http://www.linkedin.com/in/xaprb
  • 3. RANT, RECAPPED • The sky is falling • Tools drive processes, and we need better tools designed for methods • Pay attention to CAPS (Capacity, Availability, Performance, Scalability) • Monitoring tools need to be a lot smarter • Measure and monitor “work getting done”
  • 4. HARD CAPACITY • Disk volume • CPU Cycles • max_connections • File descriptors, sockets, TCP port numbers, etc • %used, absolute quantity available
  • 5. SOFT CAPACITY • Neil Gunther’s Universal Scalability Law • %used, absolute quantity available • Throughput, concurrency, errors
  • 6. AVAILABILITY • Availability is absence of downtime • %used, absolute quantity available • Throughput, concurrency, errors • MTBF, MTTR, MTTD, %availability
  • 7. TASK PERFORMANCE • Task performance is consistently fast response time. • Measure an SLA in percentile response time per task, over observation intervals • %used, absolute quantity available • Throughput, concurrency, errors • MTBF, MTTR, MTTD, %availability • Response time, 95% response time
  • 8. RESOURCE PERFORMANCE • Resource performance is ability to run tasks consistently fast. • %used, absolute quantity available • Throughput, concurrency, errors • MTBF, MTTR, MTTD, %availability • Response time, 95% response time • Throughput, concurrency, busy time, total response time, backlog/queue
  • 9. SCALABILITY • Universal Scalability Law again • %used, absolute quantity available • Throughput, concurrency, errors • MTBF, MTTR, MTTD, %availability • Response time, 95% response time • Throughput, concurrency, busy time, total response time, backlog/queue
  • 10. STALL DETECTION • Overloaded or underperforming? • %used, absolute quantity available • Throughput, concurrency, errors • MTBF, MTTR, MTTD, %availability • Response time, 95% response time • Throughput, concurrency, busy time, total response time, backlog/queue • Utilization, saturation, errors, sources of load/demand
  • 11. GIT ‘ER DONE MONITOR WORK AND RESOURCES
  • 12. WHAT NOT TO DO • Don’t use top-N lists from Google • Don’t just do what’s included in some Nagios plugin
  • 13. №1 TOP 10 LIST 1. MySQL availability 2. Presence of insecure users and databases 3. Aborted connects 4. Error log 5. Deadlocks 6. Change in server configuration 7. Slow query log 8. Slave lag 9. Percentage of maximum allowed connections 10. Percentage of full table scans
  • 14. №2 TOP 10 LIST 1. Threads_connected 2. Created_tmp_disk_tables 3. Handler_read_first 4. Innodb_buffer_pool_wait_free 5. Key_reads 6. Max_used_connections 7. Open_tables 8. Select_full_join 9. Slow_queries 10. Uptime
  • 15. №1 PLUGIN 1. threadcache-hitrate (Hit rate of the thread-cache) 2. slave-io-running (Slave io running: Yes) 3. slave-sql-running (Slave sql running: Yes) 4. qcache-hitrate (Query cache hitrate) 5. qcache-lowmem-prunes (Query cache entries pruned because of low memory) 6. keycache-hitrate (MyISAM key cache hitrate) 7. bufferpool-hitrate (InnoDB buffer pool hitrate) 8. bufferpool-wait-free (InnoDB buffer pool waits for clean page available) 9. log-waits (InnoDB log waits because of a too small log buffer) 10. tablecache-hitrate (Table cache hitrate) 11. table-lock-contention (Table lock contention) 12. index-usage (Usage of indices) 13. tmp-disk-tables (Percent of temp tables created on disk) 14. long-running-procs (long running processes)
  • 16. №2 PLUGIN 1. connection-time 2. uptime 3. threads-connected 4. threadcache-hitrate 5. q[uery]cache-hitrate 6. q[uery]cache-lowmem-prunes 7. [myisam-]keycache-hitrate 8. [innodb-]bufferpool-hitrate 9. [innodb-]bufferpool-wait-free 10. [innodb-]log-waits 11. tablecache-hitrate 12. table-lock-contention 13. index-usage 14. tmp-disk-tables 15. slow-queries 16. long-running-procs 17. slave-lag 18. slave-io-running 19. slave-sql-running 20. sql 21. open-files 22. encode 23. cluster-ndb-running
  • 19. DUPLICATE SIGNALS • Queries • Com_admin_commands • Com_assign_to_keycache • Com_alter_db • Com_alter_db_upgrade • Com_alter_event • Com_alter_function • Com_alter_procedure • Com_alter_server • Com_alter_table • Com_alter_tablespace • Com_alter_user • Com_analyze • Com_begin • Com_binlog • Com_ad_nauseum
  • 20. DESIRABLE METRICS • %used, absolute quantity available • Throughput, concurrency, errors • MTBF, MTTR, MTTD, %availability • Response time, 95% response time • Throughput, concurrency, busy time, total response time, backlog/queue • Utilization, saturation, errors, sources of load/demand
  • 24. RESOURCE LIMITS • Threads_connected near max_connections? • %table cache used? • Open file handles? • Long-running queries/transactions?
  • 25. ERRORS • Deadlocks? • Aborted connects?
  • 26. AVAILABILITY • Ability to connect and run a query? • Uptime is small? • Replication is running?
  • 27. PERFORMANCE • You can get throughput (Queries) and concurrency (Threads_running) from MySQL • But in a Nagios check, no context to know whether they’re good or bad • You generally can’t get response time, busy time, utilization, backlog, etc • You can aggregate thread states, thread times, users, databases, query abstracts...
  • 28. NAGIOS IS BEST AT LIVING IN THE MOMENT
  • 29. THOU SHALT NOT • Cache hit ratios • Thread cache hit ratio • Buffer pool cache hit ratio • Table cache hit ratio • Key cache hit ratio • Query cache hit ratio • Rates of “bad” queries • % temp tables on disk • % full table scans • % slow queries • Unfixable things • Replication delay
  • 30. WHY NOT? • Those are properties of the workload and application • They are not conditions to alert/warn about • They are not fixable / actionable in the service
  • 31. ALERTS ARE BETTER TOGETHER
  • 32. QUESTION: WHAT IS BETTER?
  • 33. №1 ALERT!!!!! Disk CRIT 100% /dev/sda2
  • 34. №2 ALERT!!!!! Replication CRIT Slave I/O Thread No
  • 35. №3 ALERT!!!!! Replication CRIT Slave SQL Thread No
  • 36. №4 ALERT!!!!! Replication CRIT Seconds_Behind_Master NULL
  • 37. №5 ALERT!!!!! MySQL CRIT oldest transaction: 86400 seconds
  • 39. №1 ALERT!!!!! CRIT * Disk /dev/sda2 full * Replication stopped * Oldest transaction 86400 seconds * 4999 threads in status “Waiting for table metadata lock”
  • 40. HOLLER AT ME QUESTIONS? @XAPRB / BARON@VIVIDCORTEX.COM
  • 41. RESOURCES • Chapter 3 of High Performance MySQL, 3rd Edition • Percona White Papers • Causes of Downtime in Production MySQL Servers • Preventing MySQL Emergencies • Goal-Driven Performance Optimization • Forecasting MySQL Scalability with the Universal Scalability Law • Method R: Optimizing Oracle Performance, Cary Millsap • The Goal, Eli Goldratt • The USE Method (Brendan Gregg) & his new book • Guerrilla Capacity Planning, Neil J. Gunther • Fundamental Performance & Scalability Instrumentation