Tachyon talk at Strata and Hadoop World 2015 at New York City, given by Haoyuan Li, Founder & CEO of Tachyon Nexus. If you are interested, please do not hesitate to contact us at info@tachyonnexus.com . You are welcome to visit our website ( www.tachyonnexus.com ) as well.
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Tachyon: An Open Source Memory-Centric Distributed Storage System
1. Haoyuan Li, Tachyon Nexus
haoyuan@tachyonnexus.com
September 30, 2015 @ Strata and Hadoop World NYC 2015
An Open Source Memory-Centric
Distributed Storage System
4. History
• Started at UC Berkeley AMPLab
– From summer 2012
– Same lab produced Apache Spark and Apache Mesos
• Open sourced
– April 2013
– Apache License 2.0
– Latest Release: Version 0.7.1 (August 2015)
• Deployed at > 100 companies
4
13. Performance Trend:
Memory is Fast
• RAM throughput
increasing exponentially
• Disk throughput
increasing slowly
13
Memory-locality key to interactive response times
18. A Use Case Example with -
• Fast, in-memory data processing framework
– Keep one in-memory copy inside JVM
– Track lineage of operations used to derive data
– Upon failure, use lineage to recompute data
map
filter
map
join
reduce
Lineage Tracking
18
19. Issue 1
19
Data Sharing is the bottleneck in
analytics pipeline:
Slow writes to disk
Spark Job1
Spark mem
block manager
block 1
block 3
Spark Job2
Spark mem
block manager
block 3
block 1
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
(slow writes)
20. Issue 1
20
Spark Job
Spark mem
block manager
block 1
block 3
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Data Sharing is the bottleneck in
analytics pipeline:
Slow writes to disk
storage engine &
execution engine
same process
(slow writes)
21. Issue 1 resolved with Tachyon
21
Memory-speed data sharing
among jobs in different
frameworks
execution engine &
storage engine
same process
(fast writes)
Spark Job
Spark mem
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS
disk
block
1
block
3
block
2
block
4
Tachyon!
in-memory
block 1
block 3
block 4
22. Issue 2
22
Spark Task
Spark memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine &
storage engine
same process
Cache loss when process
crashes
23. Issue 2
23
crash
Spark memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine &
storage engine
same process
Cache loss when process
crashes
24. HDFS / Amazon S3
Issue 2
24
block 1
block 3
block 2
block 4
execution engine &
storage engine
same process
crash
Cache loss when process
crashes
25. HDFS / Amazon S3
block 1
block 3
block 2
block 4
Tachyon!
in-memory
block 1
block 3
block 4
Issue 2 resolved with Tachyon
25
Spark Task
Spark memory
block manager
execution engine &
storage engine
same process
Keep in-memory data safe,
even when a job crashes.
26. Issue 2 resolved with Tachyon
26
HDFS
disk
block
1
block
3
block
2
block
4
execution engine &
storage engine
same process
Tachyon!
in-memory
block 1
block 3
block 4
crash
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Keep in-memory data safe,
even when a job crashes.
47. More Features
• 7) Remote Write Support
• 8) Easy deployment with Mesos and Yarn
• 9) Initial Security Support
• 10) One Command Cluster Deployment
• 11) Metrics Reporting for Clients, Workers,
and Master
47
54. Strata NYC 2015
• Welcome to visit us at our booth #P18.
• Check out other Tachyon related talks.
– First-ever scalable, distributed deep learning architecture
using Spark and Tachyon
• Christopher Nguyen (Adatao, Inc.), Vu Pham (Adatao, Inc)
• 2:05pm–2:45pm Thursday, 10/01/2015
– Faster time to insight using Spark, Tachyon, and Zeppelin
• Nirmal Ranganathan (Rackspace Hosting)
• 2:05pm–2:45pm Thursday, 10/01/2015
54