6. FASTEST-GROWING BIG DATA PROJECT
3
• Fastest growing
open-source project
in the big data
ecosystem
• 400+ contributors
from 100+
organizations
• Running world’s
largest production
clusters
• Welcome to join the
community!
7. CURRENT STATUS
4
Haoyuan Li, CEO
Alluxio (formerly Tachyon) Co-creator, Joined AMPLab Ph.D. Program 2011
FOUNDER
INVESTOR
TEAM
From AMD, Dell, Google, Palantir, Uber, Yahoo; Experts in Distributed Systems
MSs and PhDs in CS from CMU,, Stanford, UC Berkeley
Top 10 Committers of the Alluxio Open Source Project
We are Hiring!
COMPANY Founded 2015
11. BIG DATA ECOSYSTEM WITH ALLUXIO
5
…
…
FUSE Compatible File
System
Hadoop Compatible File
System
Native Key-Value
Interface
Native File System
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
12. BIG DATA ECOSYSTEM WITH ALLUXIO
5
…
…
FUSE Compatible File
System
Hadoop Compatible File
System
Native Key-Value
Interface
Native File System
Enabling Application to Access Data from any
Storage System at Memory-speed
GlusterFS InterfaceAmazon S3 Interface Swift InterfaceHDFS Interface
13. WHY ALLUXIO
6
Co-located compute and data with memory-speed access to data
Virtualized across different storage systems under a unified namespace
Scale-out architecture
File system API, software only
14. ALLUXIO BENEFITS
7
Unification
New workflows across
any data in any storage
system
Orders of magnitude
improvement in run
time
Choice in compute and
storage – grow each
independently, buy only
what is needed
Performance Flexibility
16. ALLUXIO USE CASES
9
Accelerating I/O to and from remote storage
Managing data across disparate storage systems
Sharing data across workloads at memory speed
17. ACCELERATE I/O TO/FROM REMOTE STORAGE
10
Baidu’s PMs and analysts run
interactive queries to gain insights
into their products and business
• 200+ nodes deployment
• 2+ petabytes of storage
• Mix of memory + HDD
ALLUXIO
Baidu File System
18. ACCELERATE I/O TO/FROM REMOTE STORAGE
10
The performance was amazing. With Spark
SQL alone, it took 100-150 seconds to finish a
query; using Alluxio, where data may hit local
or remote Alluxio nodes, it took 10-15 seconds.
- Baidu
RESULTS
• Data queries are now 30x faster with Alluxio
• Alluxio cluster runs stably, providing over
50TB of RAM space
• By using Alluxio, batch queries usually
lasting over 15 minutes were transformed
into an interactive query taking less than 30
seconds
Baidu’s PMs and analysts run
interactive queries to gain insights
into their products and business
• 200+ nodes deployment
• 2+ petabytes of storage
• Mix of memory + HDD
ALLUXIO
Baidu File System
19. SHARE DATA ACROSS JOBS @ MEMORY SPEED
11
Barclays uses query and machine
learning to train models for risk
management
• 6 node deployment
• 1TB of storage
• Memory only
ALLUXIO
Relational Database
20. SHARE DATA ACROSS JOBS @ MEMORY SPEED
11
Thanks to Alluxio, we now have the raw
data immediately available at every
iteration and we can skip the costs of
loading in terms of time waiting, network
traffic, and RDBMS activity.
- Barclays
RESULTS
• Barclays workflow iteration time
decreased from hours to seconds
• Alluxio enabled workflows that were
impossible before
• By keeping data only in memory, the I/O
cost of loading and storing in Alluxio is
now on the order of seconds
Barclays uses query and machine
learning to train models for risk
management
• 6 node deployment
• 1TB of storage
• Memory only
ALLUXIO
Relational Database
21. MANAGE DATA ACROSS STORAGE SYSTEMS
12
• 200+ nodes deployment
• 6 billion logs (4.5 TB) daily
• Mix of Memory + HDD
ALLUXIO
Qunar uses real-time machine
learning for their website ads.
22. MANAGE DATA ACROSS STORAGE SYSTEMS
12
We’ve been running Alluxio in production for
over 9 months, Alluxio’s unified namespace
enable different applications and frameworks
to easily interact with data from different
storage systems
- Qunar
RESULTS
• Data sharing among Spark Streaming, Spark
batch and Flink jobs provide efficient data
sharing
• Improved the performance of their system with
15x – 300x speedups
• Tiered storage feature manages storage
resources including memory, SSD and disk
• 200+ nodes deployment
• 6 billion logs (4.5 TB) daily
• Mix of Memory + HDD
ALLUXIO
Qunar uses real-time machine
learning for their website ads.
23. ALLUXIO, INC PRODUCT OFFERINGS
13
Capability/Value
Technology
Validation
Alluxio
Open Source
Open Source
Alluxio
Community
Edition (ACE)
Accelerate
Adoption
Alluxio Manager
Open Source
Alluxio
Enterprise
Edition (AEE)
Enterprise
Deployment
• Kerberos
Authentication
• Data Replication
• Support
Alluxio Manager
Open Source