keynote by Brendan Gregg. "Traditional performance monitoring makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. Modern BSD has advanced tracers and PMC tools, providing virtually endless metrics to aid performance analysis. It's time we really used them, but the problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
There's a new way to approach performance analysis that can guide you through the metrics. Instead of starting with traditional metrics and figuring out their use, you start with the questions you want answered then look for metrics to answer them. Methodologies can provide these questions, as well as a starting point for analysis and guidance for locating the root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, chain graphs, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. Many methodologies will be discussed, from the production proven to the cutting edge, along with recommendations for their implementation on BSD systems. In general, you will learn to think differently about analyzing your systems, and make better use of the modern tools that BSD provides."
6. History
• System Performance Analysis up to the '90s:
– Closed source UNIXes and applicaNons
– Vendor-created metrics and performance tools
– Users interpret given metrics
• Problems
– Vendors may not provide the best metrics
– ORen had to infer, rather than measure
– Given metrics, what do we do with them?
$ ps -auxw
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 11 99.9 0.0 0 16 - RL 22:10 22:27.05 [idle]
root 0 0.0 0.0 0 176 - DLs 22:10 0:00.47 [kernel]
root 1 0.0 0.2 5408 1040 - ILs 22:10 0:00.01 /sbin/init --
[…]
7. Today
1. Open source
– OperaNng systems: Linux, BSD, etc.
– ApplicaNons: source online (Github)
2. Custom metrics
– Can patch the open source, or,
– Use dynamic tracing (open source helps)
3. Methodologies
– Start with the quesNons, then make metrics to answer them
– Methodologies can pose the quesNons
Biggest problem with dynamic tracing has been what to do with it.
Methodologies guide your usage.
13. Traffic Light An2-Method
1. Turn all metrics into traffic lights
2. Open dashboard
3. Everything green? No worries, mate.
• Type I errors: red instead of green
– team wastes Nme
• Type II errors: green instead of red
– performance issues undiagnosed
– team wastes more Nme looking elsewhere
Traffic lights are suitable for objec2ve metrics (eg, errors), not
subjec2ve metrics (eg, IOPS, latency).
23. The USE Method
• For every resource, check:
1. Utilization: busy time
2. Saturation: queue length or time
3. Errors: easy to interpret (objective)
Starts with the questions, then finds the tools
Eg, for hardware, check every resource incl. busses:
29. RED Method
• For every service, check these are within SLO/A:
1. Request rate
2. Error rate
3. Dura=on (distribuNon)
Another exercise in posing quesNons from
funcNonal diagrams
By Tom Wilkie: hjp://www.slideshare.net/weaveworks/monitoring-microservices
Load
Balancer
Web
Proxy
Web Server
User
Database
Payments
Server
Asset
Server
Metrics
Database
39. CPI Flame Graph: BSD
A CPU flame graph (cycles) colored using instructions/stall profile data
eg, using FreeBSD pmcstat:
red == instrucNons
blue == stalls
hjp://www.brendangregg.com/blog/2014-10-31/cpi-flame-graphs.html