1. Percona Toolkit
(It's Basically Magic)
SDPHP | Business.com | 05-28-14
Notes:
Who Am I?
https://twitter.com/robertswisher
https://plus.google.com/+RobertSwisher
https://www.linkedin.com/in/robertswisher
robert@business.com
Notes:
2. Percona Toolkit
(It's Basically Magic)
SDPHP | Business.com | 05-28-14
Notes:
Who Am I?
https://twitter.com/robertswisher
https://plus.google.com/+RobertSwisher
https://www.linkedin.com/in/robertswisher
robert@business.com
Notes:
3. Percona?
Who the hell are they?!
Notes:
Formerly known as Maatkit & Aspersa
Baron Shwartz literally wrote the book on MySQL
Open-source collection of scripts to help common
tasks that every DBA and developer has to do.
- Development
- Profiling
- Configuration
- Monitoring
- Replication
- Same code, same developers, new branding
- Source now on LaunchPad (like Percona Server)
(https://launchpad.net/percona-toolkit)
What is Percona Toolkit?
(You should use Percona Server too!)
Notes:
4. Percona?
Who the hell are they?!
Notes:
Formerly known as Maatkit & Aspersa
Baron Shwartz literally wrote the book on MySQL
Open-source collection of scripts to help common
tasks that every DBA and developer has to do.
- Development
- Profiling
- Configuration
- Monitoring
- Replication
- Same code, same developers, new branding
- Source now on LaunchPad (like Percona Server)
(https://launchpad.net/percona-toolkit)
What is Percona Toolkit?
(You should use Percona Server too!)
Notes:
5. Basically anyone using running MySQL who has lots of data
Who Uses It?
(Or anyone smart and lazy like all of us)
Notes:
InstallationAs of writing current version is 2.2.7
Yum
Apt
or source
Notes:
6. Basically anyone using running MySQL who has lots of data
Who Uses It?
(Or anyone smart and lazy like all of us)
Notes:
InstallationAs of writing current version is 2.2.7
Yum
Apt
or source
Notes:
7. Tools
Notes:
What Do You Use It For?
- Schema changes
- Data archival
- Query optimization
- Data consistency
- Performance debugging
- General maintenance
Notes:
8. Tools
Notes:
What Do You Use It For?
- Schema changes
- Data archival
- Query optimization
- Data consistency
- Performance debugging
- General maintenance
Notes:
9. Schema Changes
- Always creates a copy of table before 5.6
(except fast index creation in 5.5 or 5.1 with innodb plugin)
- Table is locked during the change
- BIG tables = BIG TROUBLE (millions of rows take hours or more)
- Used to require trickery like ALTER on slave, promote to master,
ALTER on old master, promote to master again
(Gets really ugly with master-master or tiered replication)
Notes:
pt-online-schema-change
Triggers are trouble, but can be handled (dropped by default)
Foreign keys are trouble, but can be handled (dropped and rebuilt)
Takes longer than ALTER TABLE (up to 4x)
ALWAYS backup first
Notes:
10. Schema Changes
- Always creates a copy of table before 5.6
(except fast index creation in 5.5 or 5.1 with innodb plugin)
- Table is locked during the change
- BIG tables = BIG TROUBLE (millions of rows take hours or more)
- Used to require trickery like ALTER on slave, promote to master,
ALTER on old master, promote to master again
(Gets really ugly with master-master or tiered replication)
Notes:
pt-online-schema-change
Triggers are trouble, but can be handled (dropped by default)
Foreign keys are trouble, but can be handled (dropped and rebuilt)
Takes longer than ALTER TABLE (up to 4x)
ALWAYS backup first
Notes:
11. pt-online-schema-change
-- dry-run and --execute mutually exclusive
Use nohup with -- password `cat /tmp/pass`
Tune --max-lag and --max load for busy systems
Example:
nohup pt-online-schema-change --dry-run
--alter 'CHANGE `foo`
`foo` varchar(24) COLLATE 'latin1_bin' NULL AFTER `bar`'
--password `cat /tmp/pass` --print --nocheck-replication-filters
--max-load "Threads_connected:60,Threads_running:20"
D=your_db,t=really_big_table &
Notes:
Notes:
12. pt-online-schema-change
-- dry-run and --execute mutually exclusive
Use nohup with -- password `cat /tmp/pass`
Tune --max-lag and --max load for busy systems
Example:
nohup pt-online-schema-change --dry-run
--alter 'CHANGE `foo`
`foo` varchar(24) COLLATE 'latin1_bin' NULL AFTER `bar`'
--password `cat /tmp/pass` --print --nocheck-replication-filters
--max-load "Threads_connected:60,Threads_running:20"
D=your_db,t=really_big_table &
Notes:
Notes:
13. Notes:
Data Archival
- LOTS of writing to BIG tables = BAD
- Pruning BIG tables to only frequently accessed data = GOOD
- BIG tables more prone to corruption
- Deleting from BIG tables = SLOOOOOOW
- Long running transactions = REALLY SLOOOOOOOOOW
- DELETE locks MyISAM
Notes:
14. Notes:
Data Archival
- LOTS of writing to BIG tables = BAD
- Pruning BIG tables to only frequently accessed data = GOOD
- BIG tables more prone to corruption
- Deleting from BIG tables = SLOOOOOOW
- Long running transactions = REALLY SLOOOOOOOOOW
- DELETE locks MyISAM
Notes:
15. pt-archiver
Create destination table first
--dry-run exists, but --execute doesn't
If you use an auto-increment column, edit the schema
--limit is good for sequential data, but be careful if bouncing around
Use --progress to track
May want to archive from slave, then purge from master
ALWAYS backup first
Notes:
Notes:
16. pt-archiver
Create destination table first
--dry-run exists, but --execute doesn't
If you use an auto-increment column, edit the schema
--limit is good for sequential data, but be careful if bouncing around
Use --progress to track
May want to archive from slave, then purge from master
ALWAYS backup first
Notes:
Notes:
21. # Query 1: 0.00 QPS, 0.01x concurrency, ID 0x76F9EC92751F314A at byte 80096643
# This item is included in the report because it matches --limit.
# Scores: V/M = 188.68
# Time range: 2012-02-01 09:20:24 to 2013-10-04 10:47:56
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 1 490
# Exec time 14 384617s 9s 4869s 785s 1292s 385s 833s
# Lock time 2 11s 169us 6s 22ms 6ms 290ms 316us
# Rows sent 0 711.60k 0 4.82k 1.45k 4.27k 1.44k 685.39
# Rows examine 10 30.01G 0 123.80M 62.71M 117.57M 44.73M 75.78M
# Rows affecte 0 0 0 0 0 0 0 0
# Rows read 0 711.60k 0 4.82k 1.45k 4.27k 1.44k 685.39
# Bytes sent 0 21.90M 0 167.52k 45.77k 143.37k 52.07k 8.46k
# Tmp tables 8 1.91k 2 4 3.99 3.89 0.16 3.89
# Tmp disk tbl 0 0 0 0 0 0 0 0
# Tmp tbl size 3 3.36G 0 7.98M 7.02M 7.65M 1.26M 7.65M
# Query size 0 471.35k 982 986 985.03 964.41 0 964.41
# String:
# Databases bdc_ccm
# Hosts
# InnoDB trxID 13E9F1B2 (1/0%), 1402493D (1/0%)... 488 more
# Last errno 0
# Users semuser (488/99%), jackie.lam (1/0%)... 1 more
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms
# 100ms
# 1s #
# 10s+ ################################################################
# Tables
# SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_inbound'G
# SHOW CREATE TABLE `bdc_ccm`.`click_log_inbound`G
# SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_outbound_tp'G
# SHOW CREATE TABLE `bdc_ccm`.`click_log_outbound_tp`G
# SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_outbound'G
# SHOW CREATE TABLE `bdc_ccm`.`click_log_outbound`G
# EXPLAIN /*!50100 PARTITIONS*/
selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc), 'AS' astag
from click_log_inbound a, click_log_outbound_tp b
wherea.id = b.inbound_id
and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59'
and b.partner like'adsense'
and b.flag = 0
group by date(a.timestamp), a.referrer
union all
selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc) , 'FL' astag
from click_log_inbound a, click_log_outbound b
wherea.id = b.inbound_id
and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59'
and flag = 0
group by date(a.timestamp), a.referrer
union all
selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc), 'TP' astag
from click_log_inbound a, click_log_outbound_tp b
wherea.id = b.inbound_id
and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59'
and b.partner in ('capterra', 'bdc_network')
and b.flag = 0
group by date(a.timestamp), a.referrerG
Notes:
Data Consistency
- Replication isn't perfect
- Replication filters
- master-master replication
- 1062 “DUPLICATE KEY ERROR”
- Server crashes
- Non-deterministic (aka not idempotent) writes
Notes:
22. # Query 1: 0.00 QPS, 0.01x concurrency, ID 0x76F9EC92751F314A at byte 80096643
# This item is included in the report because it matches --limit.
# Scores: V/M = 188.68
# Time range: 2012-02-01 09:20:24 to 2013-10-04 10:47:56
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 1 490
# Exec time 14 384617s 9s 4869s 785s 1292s 385s 833s
# Lock time 2 11s 169us 6s 22ms 6ms 290ms 316us
# Rows sent 0 711.60k 0 4.82k 1.45k 4.27k 1.44k 685.39
# Rows examine 10 30.01G 0 123.80M 62.71M 117.57M 44.73M 75.78M
# Rows affecte 0 0 0 0 0 0 0 0
# Rows read 0 711.60k 0 4.82k 1.45k 4.27k 1.44k 685.39
# Bytes sent 0 21.90M 0 167.52k 45.77k 143.37k 52.07k 8.46k
# Tmp tables 8 1.91k 2 4 3.99 3.89 0.16 3.89
# Tmp disk tbl 0 0 0 0 0 0 0 0
# Tmp tbl size 3 3.36G 0 7.98M 7.02M 7.65M 1.26M 7.65M
# Query size 0 471.35k 982 986 985.03 964.41 0 964.41
# String:
# Databases bdc_ccm
# Hosts
# InnoDB trxID 13E9F1B2 (1/0%), 1402493D (1/0%)... 488 more
# Last errno 0
# Users semuser (488/99%), jackie.lam (1/0%)... 1 more
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms
# 100ms
# 1s #
# 10s+ ################################################################
# Tables
# SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_inbound'G
# SHOW CREATE TABLE `bdc_ccm`.`click_log_inbound`G
# SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_outbound_tp'G
# SHOW CREATE TABLE `bdc_ccm`.`click_log_outbound_tp`G
# SHOW TABLE STATUS FROM `bdc_ccm` LIKE 'click_log_outbound'G
# SHOW CREATE TABLE `bdc_ccm`.`click_log_outbound`G
# EXPLAIN /*!50100 PARTITIONS*/
selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc), 'AS' astag
from click_log_inbound a, click_log_outbound_tp b
wherea.id = b.inbound_id
and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59'
and b.partner like'adsense'
and b.flag = 0
group by date(a.timestamp), a.referrer
union all
selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc) , 'FL' astag
from click_log_inbound a, click_log_outbound b
wherea.id = b.inbound_id
and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59'
and flag = 0
group by date(a.timestamp), a.referrer
union all
selectdate(a.timestamp), a.referrer, count(b.inbound_id), sum(b.cpc), 'TP' astag
from click_log_inbound a, click_log_outbound_tp b
wherea.id = b.inbound_id
and a.timestamp between '2012-02-08 00:00:00' and '2012-02-09 23:59:59'
and b.partner in ('capterra', 'bdc_network')
and b.flag = 0
group by date(a.timestamp), a.referrerG
Notes:
Data Consistency
- Replication isn't perfect
- Replication filters
- master-master replication
- 1062 “DUPLICATE KEY ERROR”
- Server crashes
- Non-deterministic (aka not idempotent) writes
Notes:
23. pt-table-checksum
Requires STATEMENT based replication for tiered replication
Replication filters are dangerous because a failed query can
break replication
May want to use nohup since it can be slow
Notes:
Notes:
24. pt-table-checksum
Requires STATEMENT based replication for tiered replication
Replication filters are dangerous because a failed query can
break replication
May want to use nohup since it can be slow
Notes:
Notes:
25. pt-table-sync
Notes:
pt-table-sync
--dry-run and --execute mutually exclusive
ALWAYS backup first
In a tiered replication setup or master-master
take extra care to think through what will be done
Run on master to sync all slaves
pt-table-sync --execute --replicate test.checksum master1
Run on master for slaves individually to sync to master
pt-table-sync --execute --sync-to-master slave1
Notes:
26. pt-table-sync
Notes:
pt-table-sync
--dry-run and --execute mutually exclusive
ALWAYS backup first
In a tiered replication setup or master-master
take extra care to think through what will be done
Run on master to sync all slaves
pt-table-sync --execute --replicate test.checksum master1
Run on master for slaves individually to sync to master
pt-table-sync --execute --sync-to-master slave1
Notes:
29. Performance Debugging
- Problems can be random
- Problems only last for a few seconds,
you can't connect and observe fast enough
- Problems like to happen at odd hours;
ETL, rollups, reporting, etc
- You can't ALWAYS log on
Notes:
pt-stalk
- Creates a lot of files
- Output inspected with pt-sift
Notes:
30. Performance Debugging
- Problems can be random
- Problems only last for a few seconds,
you can't connect and observe fast enough
- Problems like to happen at odd hours;
ETL, rollups, reporting, etc
- You can't ALWAYS log on
Notes:
pt-stalk
- Creates a lot of files
- Output inspected with pt-sift
Notes:
31. pt-stalk
Run as root
--daemonize fork and run in the background
-- sleep length to sleep between collects
-- cycles the number of cycles the var must be true to collect
--variable Threads_running and Execution_time are good ones
--disk-bytes-free don't collect if this threshold is hit
(best practice would be to set --log and --dest to a different disk
than your data lives on the same as other mysql logs)
Notes:
Notes:
32. pt-stalk
Run as root
--daemonize fork and run in the background
-- sleep length to sleep between collects
-- cycles the number of cycles the var must be true to collect
--variable Threads_running and Execution_time are good ones
--disk-bytes-free don't collect if this threshold is hit
(best practice would be to set --log and --dest to a different disk
than your data lives on the same as other mysql logs)
Notes:
Notes:
33. pt-sift
Pass it the path to the dir used with --data
(default /var/lib/pt-stalk)
Interactive program
Lots of data points collected from the time of the incident
Notes:
Notes:
34. pt-sift
Pass it the path to the dir used with --data
(default /var/lib/pt-stalk)
Interactive program
Lots of data points collected from the time of the incident
Notes:
Notes:
35. General Admin / Maintenance
pt-slave-restart - try to restart a slave skipping errors
if replication fails
pt-summary - gives a general summary of the MySQL instance
pt-upgrade - tests logged queries against a new MySQL version
pt-config-diff - show formatted diff of my.cnf files
pt-heartbeat - update table on master with heartbeat data from slaves
pt-kill - kill MySQL threads according to filters
pt-index-usage - report on index structure and usage
pt-variable-advisor - looks at runtime vars and makes suggestions
Notes:
http:s//cloud.percona.com to sign up for beta
Percona Cloud Tools
Notes:
36. General Admin / Maintenance
pt-slave-restart - try to restart a slave skipping errors
if replication fails
pt-summary - gives a general summary of the MySQL instance
pt-upgrade - tests logged queries against a new MySQL version
pt-config-diff - show formatted diff of my.cnf files
pt-heartbeat - update table on master with heartbeat data from slaves
pt-kill - kill MySQL threads according to filters
pt-index-usage - report on index structure and usage
pt-variable-advisor - looks at runtime vars and makes suggestions
Notes:
http:s//cloud.percona.com to sign up for beta
Percona Cloud Tools
Notes: