Performance analysis of production systems using DTrace

Performance analysis and troubleshooting using DTrace
“I keep six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Who.” - Rudyard Kipling
Graeme Jenkinson - Computer Laboratory

Today’s lecture
Overview of tracing and profiling
DTrace: Dynamic Instrumentation of Production
Systems
Performance analysis with DTrace - case study
Tracing and profiling on Linux
Wrap up and questions

Overview of tracing and profiling
• A profile provides an inventory of performance events and
timings for the execution as a whole.
• Sampling at a fixed time interval
• Aggregated results - requests/per second
• Absolute chronology of events is ignored
• A trace records the chronology, often with timestamps:
• Absolute chronology of events is preserved
• Trace increasing at runtime - must be periodically
Focus on tracing

Who did what to who?
04 Intuition library
Address of task

Tracing software as a black box
• print statements:
• printf(“Here”);
• #ifndef NDEBUG
printf(“State: Connected”);
#endif
• Logging frameworks:
• info!(“State: connected”);
• logger.error(“Failed: to write file”);
Can declaratively turn on/off
logging at compile time
Can declaratively turn
on/off logging at
runtime

Tracing software as a black box - summary
Print statements Logging framework
Pros
• As simple as it gets
• Declarative control at
compile time
• Declarative control at
runtime or compile time
• Finer grained control of
logging (per
class/module)
• Structured output
• Flexibility
Cons
• Adding tracing requires a
recompile (in some case)
• Ad-hoc adoption
• Inflexible (requiring
compile time decisions)
• Tiny bit more effort to
adopt
• Added dependency to
codebase

Logging can only get you so far
• Errors frequently the
result of complex
interactions across the
software stack
• Logging limited to a
single layer of the the
software stack
• Logging frameworks
don’t capture the events
you want to see

Debuggers
• When logging really doesn’t cut it we
often break out the debugger
• Set breakpoints, single step code,
watch variables/registers
• Debuggers often struggle when facing
certain types of problems:
• Heisenbug: “a software bug that
seems to disappear or alter its
behaviour when one attempts to study
it"

Probe effects
• Probe effect - act of measuring disturbs system
• Electronics - probes introduce additional capacitance, resistance or
inductance
• Software tracing - probes take time to execute
• Bugs from subtle timing effects are
common, probe effects can rarely
be ignored

Observability/transparency
• Rather than treating the system as a black box we need observability
into the system when executing
• What’s needed is more flexible less intrusive tools, thankful there are
plenty of them
• Do these tools have a probe effect?

Observability/transparency tools

Observability tools: top
NAME
top - display Linux processes
top (table of processes) - reads kernel state from /proc filesystem (on
Linux, FreeBSD); procfs is a pseudo-filesystem that presents information
about processes and other system information in a hierarchical filesystem-
like structure.
Note: procfs doesn’t come free (probe effect). What operations need to be
performed?

Observability tools: strace (dtruss Mac OS X, truss FreeBSD)
Linux
strace - trace system calls and signals
Mac OS X (requires disabling System Integrity Protection - El Capitan+)
dtruss - process system call details. Uses DTrace.
FreeBSD
truss - trace system calls

Utilization, Saturation and Errors (USE)
• Utilization: the average time that the resource was busy servicing work
• Saturation: the degree to which the resource has extra work which it can't service, often
queued
• Errors: the error count
• Resources:
• CPUs: sockets, cores, hardware threads (virtual CPUs)
• Memory: capacity
• Network interfaces
• Storage devices: I/O, capacity
• Controllers: storage, network cards
• Interconnects: CPUs, memory, I/O

Layered architectures and observability
• System performance
problems are typically
introduced at the highest
layers of abstraction
• Effects are often first
encountered and attributed
at the lowest layers of
abstraction
• Observability is required
across all layers of the stack

In production analysis
• As software engineers
we graduate from
screwing up our own
system to screwing up
other peoples
• In production tracing
must be safe
• Probe effect must be
zero when disabled -
“low” when tracing

DTrace: one tool to rule them all
• Developed by Bryan
Cantrill, Mike Shapiro,
and Adam Leventhal at
Sun Microsystems
• Release in 2005 by Sun
Microsystems
• “Performance analysis
infrastructure has
generally not kept pace
with the shift to in-
production performance
analysis” - Cantrill et al
Illumos, FreeBSD, NetBSD, MacOS X
Linux (ish), Windows (yuck)

DTrace overview (Cantrill ’04)
• Dynamic instrumentation
• Zero probe effect when disabled
• Unified instrumentation
• Instruments both user and kernel level software
• Arbitrary-context kernel instrumentation
• Can instrument any aspect of the kernel including delicate sub-
systems such as the scheduler and synchronisation facilities
As a result probes can’t block
What is the impact when enabled?

DTrace overview (cont)
• Data integrity
• Records may be dropped under load - but never silently
• High-level control language
• Scriptable “arbitrary” actions on probe firing
• Predicates that determine whether a probe fires
• Scripts executed by in-kernel virtual machine
• Production safe
• Aborts when system is heavily loaded
• Scalable in kernel aggregations (profiling)
How is safety preserved?

DTrace overview - anatomy of a dynamic tracer
Consum
er
Compiler
fbt::vm_fault:entry {
fault_count += 1;
}
Probes
Userspace
Buffers (per-CPU)
Kernel
Results
D script

DTrace providers
• fbt: Function Boundary
Tracing
• syscall: System Calls
• pid: User space processes
• proc: Process Operations
• sched: Scheduler
• Network Protocols
• IP, UDP, TCP
• lock: Kernel locking points
• io: I/O calls
• vfs: Filesystem Routines
• profile: Timing source
• mac_framework:
Mandatory Access Control
framework

DTrace providers (cont)
dtrace -l List all registered probes
dtrace -l -P fbt
List probes registered by Function
Boundary Tracing provider
dtrace -l -f vm_fault List probes for function vm_fault
provider:module:function:name
kernel module
DTrace provider
function to trace
probe name

DTrace providers (cont)
• Statically added probes - added by developers
• Known useful points of instrumentation: starting new threads, TCP
state machine transitions
• Dynamically added - probes automatically work out where to insert
probes
• Static probes are stable instrumentation points, dynamic probes depend
on the specifics of the implementation and therefore are subject to
change

DTrace tracing example
syscall:::entry
/execname == “ls”/
{
printf(“syscall = %sn”, probefunc);
}
dtrace -c ls -n ‘…’
Predicate - only for the ls
command
Action - Print the probes
function
Built in variable

DTrace profiling example
BEGIN {
syscall_count = 0;
}
syscall:::entry
{
syscall_count++;
}
END {
printf(“syscall_count = %dn”, syscall_count);
}
Initialize the count (global)
Print the count
Instrument all syscall entries
Only for the ls command

DTrace profiling example 2
syscall:::entry
{
t[probefunc] = timestamp;
}
syscall:::return
/t[probefunc] != 0/
{
@funct[probefunc] = sum(timestamp - t[probefunc]);
t[probefunc] = 0;
}
Other useful profiling probes include profile-199 - why 199
seconds?

DTrace user land example 1
• pid (process ID) provider
• Allow tracing of and arbitrary process such as a Web browser of
databases
• Process does not need to be restarted (important in production)
dtrace -n ‘pid$target:::entry { trace(timestamp); }’ -
c ls
dtrace -n ‘pid$target:::entry { trace(timestamp); }’ -
p …
process id to trace

DTrace internals
• DTrace core framework
• Interfaces to userspace through ioctl() to /dev/dtrace/*
• Majority of the code (dtrace.c) is generic framework code
• Delegates instrumentation to providers
• Implements a virtual machine for the D Intermediate Format (DIF)
• Ensures safety of D scripts
• Analogous to Java bytecode

DTrace internals
• Provider
• Holds the knowledge about specific instrumentation techniques
• Enables probes when consumer says so (specific to each provider)
• Transfer control to DTrace framework by calling dtrace_probe(probe_id) somehow
• Then, DTrace framework can performs actions associated with probe
• dtrace_probe()
• Runs in specialized context known as probe context
• Interrupts are disabled for the CPU executing the probe
• Does no memory allocation and takes no locks

DTrace internals
• D compiler:
• Compiles D scripts to D Intermediate Format (DIF)
• Assembles DIF into DIFOs (for predicate and actions)
• Merges DIFOs into DOF (D Object Format)
• DIF has a RISC-like instruction set that supports D language constructs
dtrace -S -n 'syscall:freebsd:open:entry {printf("%s
%sn", execname, copyinstr(arg0)); }'
There’s no spec but we are
working on it

DTrace internals
• Enabling Control Blocks (ECBs)
• Kernel data structure that
represents an enabled probe
• Each ECB contains DIFO for
predicates and actions
associated with probe enabling
• Can be chained together if
multiple consumers are
interested in a given probe
Probe
ECB ECB
Predicate
DIFO
Action
DIFO
Action
DIFO

DTrace internals - Function Boundary Tracing (FBT)
• sys_getpid, no DTrace
<sys_getpid> push %rbp
<sys_getpid+1> mov %rsp,%rbp
<sys_getpid+4> mov 0x8(%rdi),%rax
<sys_getpid+8> movslq 0xbc(%rax),%rax
• sys_getpid, while running dtrace -n ‘fbt:kernel:sys_getpid:entry'
• int3 replaces first sys_getpid instruction
• push instruction is emulated in the kernel
• original push instruction is restored after dtrace exits
<sys_getpid> int3
<sys_getpid+1> mov %rsp,%rbp
<sys_getpid+4> mov 0x8(%rdi),%rax
<sys_getpid+8> movslq 0xbc(%rax),%rax
Used by debuggers to set
breakpoints - architecture
dependent
DTrace uses CTF
Compressed Type
Format) to identify
function entry point

DTrace in the real world
• How many software engineers are there in the world working in
systems?
• Boem estimates 700k work in “infrastructure”, which is a fairly broad
definition
• How many people have used DTrace? ~1000s.
• How many people have written DTrace scripts? <1000
• How many people now how DTrace works in any detail? ~100

DTrace case study - Customers are not the enemy
• Scenario: Customer system reporting terrible performance
• NFS server
• SAN backend
• ~ 300 Mb/s
• 4 socket, 6 cores per socket
• Symptoms: Latency is “bad”, our customers are angry. Your product sucks
https://www.infoq.com/presentations/dtrace
Customers don’t report details like latency is twice average specified in
service level agreement!

DTrace case study - Performance goals
• Why work isn’t being done?
• Thread not being schedule
• Queue of work isn’t being processed
• Get idle into the system
• Identify work being done inefficiently
• Identify where cycles are wasted doing the wrong thing
Useful DTrace probes of-cpu

DTrace case study - Measuring NFS latency
• Analysis: Develop DTrace scripts for measuring NFS latency
nfsv3:::op-read-start,
nfsv3:::op-write-start
{
self->ts = timestamp
}
nfsv3:::op-read-done,
nfsv3:::op-write-done
{
@[probename == “op-write-done” ? “write” : “read” =
quantize(timestamp - self.ts);
self.ts = 0;
}
quantize

DTrace case study
Synchronous
nfsv3:::op-write-start
{
self->sync = args[2]->stable != 0;
}
Cache miss
sdt:::arc-miss,
sdt:::blocked-read
{
self->io = 1;
}
Cached, synchronous and
asynchronous read/writes very
different performance - different
populations

DTrace case study - synchronous write
• Average latency looks pretty good

DTrace case study - synchronous write
Average latency not so useful - outliers dominate
Linear Log

DTrace case study - summary
• Instrumentation with DTrace identified issues with…
• Throttling, inefficiencies in code and a heavily contented lock
• Lessons learnt
• Need to analyse performance on real-world systems
• Right tools, right questions
• Remedies without analysis don’t work and antagonise customers
• Systems shouldn't absorb all the pain

DTrace in the real world
• DTrace Toolkit
• http://www.brendangregg.com/dtracetoolkit.html
• DTrace one liners
• http://www.brendangregg.com/DTrace/dtrace_oneliners.
txt
• https://wiki.freebsd.org/DTrace/One-Liners

One tool to rule them all…
• In 2017 in production means a distributed system
• Gap opened up once again between the types of in-production analysis
we’d like to perform and available tools
• Distributed DTrace - see CADETS project
• Tracing internode flows?
• Volumes of data
• Security?
• What about heterogenous environments?
• Need to instrument Linux, Ilumos, FreeBSD environments seamlessly

Linux: One tool to rule them all… not so much

ftrace and perf
• ftrace
• Tracing utility built directly into the Linux kernel
# cd /sys/kernel/debug/tracing
# cat available_tracers
• To enable tracer:
# echo function_graph > current_tracer

perf
• perf - instruments "events", which are a unified interface for different
kernel instrumentation frameworks.
• Hardware Events: These instrument low-level processor activity
based on CPU performance counters. For example, CPU cycles,
instructions retired, memory stall cycles, level 2 cache misses, etc.
Some will be listed as Hardware Cache Events.
• Software Events: These are low level events based on kernel
counters. For example, CPU migrations, minor faults, major faults,
etc.

perf (cont)
• perf - instruments "events", which are a unified interface for different kernel
instrumentation frameworks.
• Tracepoint Events: This are kernel-level events based on the ftrace framework.
These tracepoints are placed in interesting and logical locations of the kernel, so
that higher-level behavior can be easily traced. For example, system calls, TCP
events, file system I/O, disk I/O, etc. These are grouped into libraries of
tracepoints; eg, "sock:" for socket events, "sched:" for CPU scheduler events.
• Dynamic Tracing: Software can be dynamically instrumented, creating events
in any location. For kernel software, this uses the kprobes framework. For user-
level software, uprobes.
• Timed Profiling: Snapshots can be collected at an arbitrary frequency, using
perf record -FHz. This is commonly used for CPU usage profiling, and works by
creating custom timed interrupt events.

SystemTap
• Scripting language and tool for dynamically instrumenting running
production Linux kernel-based operating systems
• stap -v -e 'probe vfs.read {printf("read
performedn"); exit()}'
virtual file system read
• “safe to use in production?” “yes”
https://sourceware.org/systemtap/wiki/SystemtapDtraceComparison

BPF (nee eBPF)
• Berkley Packet Filter (PBF)
• Interface to data link layers to capture raw packets (tcpdump)
• BPF virtual machine executes user scripts to filter packets
• JIT BPF instructions to native code - to improve performance
• Bytecode Probe Framework (maybe?)
• DTrace-like facilities attaching BPF scripts to tracepoints: kprobes, uprobes, tracepoints
(soon)
• bcc front-end (and others)
• What can we do with BPF?
• “crazy stuff”
Designed for tracing 10GbE networking -
highly performant
Walk stacks…

Bytecode Probe Framework
BPF
kprobes
uprobes
tracepoints
maps
KernelUserland
BPF bytecode
per-event
data
statistics
generate load
async read
perf output
per-event output and in-kernel summarisation

Any questions?
Tracing tools quickly outperform log based analysis in production
Vast array of tracing tools in Linux and BSD-based operating systems
DTrace provides unified tracing with guaranteed safety in production
All tracing mechanisms introduce probe effects
Probe effects can change the very behaviour we are investigating
Measure (on the real system), analyse and remediate - don’t guess
DTrace on Linux isn’t production ready but other tools such as
SystemTap and eBPF are roughly equivalent

Performance analysis of production systems using DTrace

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Performance analysis of production systems using DTrace

Similar to Performance analysis of production systems using DTrace (20)

Recently uploaded

Recently uploaded (20)

Performance analysis of production systems using DTrace