2. Distribution Statement A: Approved for public release; distribution is unlimited.
NAME
ddtrace – distributed tracing framework
SYNOPSIS
ddtrace [NODE]... [QUERY]
DESCRIPTION
ddtrace distributes event query expressions over
many hosts to track inter-node information flows
and temporal sequences, implementing post-hoc
trace aggregation, or as needed, tagging of TCP/IP
packets, filesystem RPCs, and application-layer
protocols with temporal and information-flow
labels.
AUTHOR
Written by Graeme Jenkinson
SEE ALSO
Pivot tracing, Dapper, X-Trace, Magpie
3. Distribution Statement A: Approved for public release; distribution is unlimited.
Capture distributed tracing use cases
Design space exploration
Prototype and refine designs
Trial on real world problems
Roadmap for distributed dtrace
Focus to
date
4. Distribution Statement A: Approved for public release; distribution is unlimited.
Security Event and Incident Management
Observed provenance
Monitoring client/server protocol
Scheduling for warehouse-scale computing
Performance monitoring/debugging computational finance
Use Cases
Transparent
computing
$$$
OPUS
5. Distribution Statement A: Approved for public release; distribution is unlimited.
Monitor client/server
protocols
#dtrace – n ‘fbt::tcp_state_change:entry {...}’
6. Distribution Statement A: Approved for public release; distribution is unlimited.
Key requirements
Production safe
Performance proportionality
Track causal relationships between nodes
Simply to package and deploy
Zero probe effect
when inactive
Which causal
relationships?
How to track causal
relationships?
7. Distribution Statement A: Approved for public release; distribution is unlimited.
Design principles
Log - append only
totally ordered sequence
of records
first next record
Record what
happened when
Update global log/
other data structures
8. Distribution Statement A: Approved for public release; distribution is unlimited.
Prototype
ddtrace
Machine readable
dtrace output
9. Distribution Statement A: Approved for public release; distribution is unlimited.
Separate stream
processing from
packaging a deployment
Minimise number of
moving parts
10. Distribution Statement A: Approved for public release; distribution is unlimited.
Prototype
ddtrace
Analyst
tools
d script compiled
here for arch
independence
11. Distribution Statement A: Approved for public release; distribution is unlimited.
Tracking causal relationships
Within
Per-cpu buffers Between nodes
Between
per-cpu buffers
A B
A happens-before B
A
B
tTSC(A) < tTSC(B)
A happens-before B
A CB
A happens-before B
Distributed commit log
12. Distribution Statement A: Approved for public release; distribution is unlimited.
TCP sequence
numbers
snd_nxt/
rcv_next
IPsec AH
sequence
number1
2
2
A 0
A B
A++ B
IP
header
AH
header
TCP
header
Data
int ipsec_checkreplay(u_int32_t seq, …);
14. Distribution Statement A: Approved for public release; distribution is unlimited.
Is a distributed commit log the right abstraction? What
are the semantics and performance required (how do they
compare to what Kafka gives)?
Is a framework the right approach to solve a range of
problems?
What infrastructure should we expect that people will
stand up? Is software running on a JVM OK (sometimes,
always)?
How do we get people interested and using our approach
on real world problems?
How do we deal will reliability? How to best get event
records out of the kernel?
Open questions