More Related Content Similar to Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017 (20) More from Alluxio, Inc. (20) Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 20171. UNIFY DATA AT MEMORY SPEED
Haoyuan (HY) Li, CEO @ Alluxio Inc.
VAULT Conference 2017
March 2017
2. HISTORY
• Started at UC Berkeley AMPLab In Summer 2012
• Originally named as Tachyon
• Rebranded to Alluxio in early 2016
• Open Sourced in 2013
• Apache License 2.0
• Latest Stable Release: Alluxio 1.4.0
• Alluxio 1.5.0 Planned For Q2, 2017
2
6. © 2017 Alluxio Confidential
BIG DATA ECOSYSTEM WITH ALLUXIO
…
…
FUSE Compatible File
System
Hadoop Compatible File
System
Native Key-Value
Interface
Native File System
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
3
7. © 2017 Alluxio Confidential
BIG DATA ECOSYSTEM WITH ALLUXIO
…
…
FUSE Compatible File
System
Hadoop Compatible File
System
Native Key-Value
Interface
Native File System
Enabling Application to Access Data from any
Storage System at Memory-speed
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
3
11. © 2017 Alluxio Confidential
FASTEST-GROWING BIG DATA PROJECT
• Formerly named
Tachyon, born in the
AMPLab
• 500+ contributors
from 100+
organizations
• Running world’s
largest production
clusters
6
12. © 2017 Alluxio Confidential
WHY ALLUXIO
7
Co-located compute and data with memory-speed access to data
Virtualized across different storage systems under a unified namespace
Scale-out architecture
File system API, software only
13. © 2017 Alluxio Confidential
ALLUXIO BENEFITS
Unification
New workflows across
any data in any storage
system
Orders of magnitude
improvement in run
time
Choice in compute and
storage – grow each
independently, buy only
what is needed
Performance Flexibility
8
15. © 2017 Alluxio Confidential
ALLUXIO USE CASES
On-Demand Analytics &
Accelerating I/O to and from remote storage
Managing data across disparate storage systems
Sharing data across workloads at memory speed
10
16. © 2017 Alluxio Confidential
MANAGE DATA ACROSS STORAGE SYSTEMS
“We’ve been running in production for
over 9 months, Alluxio’s enabled
different applications & frameworks to
easily interact with data from different
storage systems
RESULTS
• Data sharing among Spark
Streaming, Spark batch and Flink
jobs provide efficient data sharing
• Improved the performance of their
system with 15x – 300x speedups
• Tiered storage feature manages
storage resources including
memory, SSD and disk
Qunar uses real-time machine learning
for their website ads
• 200+ nodes deployment
• 6 billion logs (4.5 TB) daily
• Mix of Memory + HDD
ALLUXIO
11
17. © 2017 Alluxio Confidential
ON-DEMAND ANALYTICS &
ACCELERATE I/O TO/FROM REMOTE STORAGE
“The performance was amazing. With
Spark SQL alone, it took 100-150 seconds to
finish a query; using Alluxio, where data
may hit local or remote Alluxio nodes, it
took 10-15 seconds.
RESULTS
• Data queries are now 30x faster with
Alluxio
• Alluxio cluster runs stably, providing
over 50TB of RAM space
• By using Alluxio, batch queries usually
lasting over 15 minutes were
transformed into an interactive query
taking less than 30 seconds
PMs run interactive queries to gain
insights into their products & business
• 200+ nodes deployment
• 2+ petabytes of storage
• Mix of memory + HDD
ALLUXIO
Baidu
File
System
12
18. © 2017 Alluxio Confidential
SHARE DATA ACROSS JOBS @ MEMORY SPEED
“Thanks to Alluxio, we now have the raw
data immediately available at every
iteration & can skip the costs of loading
in terms of time waiting, network traffic,
and RDBMS activity.
RESULTS
• Barclays workflow iteration time
decreased from hours to seconds
• Alluxio enabled workflows that were
impossible before
• By keeping data only in memory, the
I/O cost of loading and storing in
Alluxio is now on the order of
seconds
Barclays uses query & machine learning
to train models for risk management
• 6 node deployment
• 1TB of storage
• Memory only
ALLUXIO
13
ALLUXIO
Relational
Database:
Teradata
19. © 2017 Alluxio Confidential
Thank you!
Contact: {haoyuan}@alluxio.com or info@alluxio.com
14