This presentation covers describes my own benchmarking of Apache Storm and Apache Flink based on the work started by Yahoo! It shows the incredible performance of Apache Flink
2. Who am I?
• Director of Applications Engineering at data
Artisans
• Previously working on streaming computation at
Twitter, Gnip and Boulder Imaging
• Involved in various kinds of stream processing for
about a decade
• High-speed video, social media streaming, general
frameworks for stream processing
3. Overview
• Yahoo! performed a benchmark comparing
Apache Flink, Storm and Spark
• The benchmark never actually pushed Flink to it’s
throughput limits but stopped at Storms limits
• I knew Flink was capable of much more so I
repeated the benchmarks myself
• I did a follow up blog post explaining my findings
and will summarize them here
4. Yahoo! Benchmark
• Count ad impressions grouped by campaign
• Compute aggregates over a 10 second window
• Emit current value of window aggregates to
Redis every second for query
• Map ads to campaigns using Redis as well
18. Processing Guarantees
Apples and Oranges
Apache Storm Apache Flink
At least once
semantics
Exactly once
semantics
Double counting after
failures
No double counting
Lost state after
failures
No state loss
30. Results
• Apache Flink achieved 15 million messages / sec
on Yahoo! benchmark
• Much stronger processing guarantees: Exactly
once
• 80x higher than what was reported in the original
Yahoo! benchmark on similar hardware
32. Storm Compatibility
• Lot’s of companies already have applications written
using the Storm API
• Flink provides a Storm compatibility layer
• Run your Storm jobs on Flink with a one line code
change
• Flink also allows you to reuse your existing Storm
spout and bolt code from a Flink job
• Give it a try!