SlideShare a Scribd company logo
1 of 48
Download to read offline
Memory-Speed Virtual Distributed Storage: Latest Features and Use Cases
August 2016 @ Strata+Hadoop World Beijing
Bin Fan, Yupeng Fu
1
About Us
•  Bin Fan 
•  Software Engineer @ Alluxio, Inc.
•  Alluxio PMC
•  Worked at Google, MSR
•  CMU, CUHK, USTC
•  Wechat @ apc999_fb
•  Yupeng Fu
•  Software Engineer @ Alluxio, Inc.
•  Alluxio PMC
•  Worked at Palantir, Google
•  UCSD, Tsinghua 
•  Wechat @richbird9
2
•  Team consists of Alluxio creators and top committers
•  Invested by

•  Committed to Alluxio Open Source

•  http://www.alluxio.com
Alluxio Inc.
3
Outline
•  What is Alluxio
•  Example use cases
1.  Off-heap memory to alleviate resource pressure
2.  Fast data sharing between jobs
3.  Accelerate access to remote storage
4.  Unified namespace
•  Summary

4
Non-persistent
data-storage
software.
ATTRIBUTES
Memory-Speed
 Virtual
 Distributed
 Storage
Scale out
architecture.
Virtualized across different
storage types under a
unified namespace.
Memory-speed
access to data.
5
OPEN SOURCE ALLUXIO: History
• Started in the AMPLab at UC Berkeley
–  Summer of 2012
• Open Sourced as Tachyon (renamed to Alluxio)
–  Apache 2.0 License
–  Latest Release: version 1.2.0 (July, 2016)
6
OPEN SOURCE ALLUXIO
v0.2
 v0.3
v0.4
v0.5
v0.6
v0.7
v0.8
v1.0
v1.1
7
OPEN SOURCE ALLUXIO
The fastest growing
open-source project
in the big data
ecosystem.

Currently over 300
contributors from
over 100
organizations.
8
Outline
•  What is Alluxio
•  Example use cases
1.  Off-heap memory to alleviate resource pressure
2.  Fast data sharing between jobs
3.  Accelerate access to remote storage
4.  Unified namespace
•  Summary

9
Application 1
Alluxio
CASE 1:
Off-heap Memory Storage to
Alleviate Resource Pressure
10
HDFS / Amazon S3
Consolidating Memory
Spark Job1
Spark
Memory
block 1
block 3
Spark Job2
Spark
Memory
block 3
block 1
block 1
block 3
block 2
block 4
storage engine & 
execution engine
same process
Data duplicated at memory-level
11
Spark Job1
Spark mem
Spark Job2
Spark mem
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3
 block 4
Data not duplicated at memory-level
Consolidating Memory
storage engine & 
execution engine
same process
12
Data Resilience during Crashes
Spark Task
Spark Memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Process crash requires network and/or disk I/O to re-read the
data
storage engine & 
execution engine
same process
13
Data Resilience during Crashes
Crash
Spark Memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Process crash requires network and/or disk I/O to re-read the
data
storage engine & 
execution engine
same process
14
HDFS / Amazon S3
Data Resilience during Crashes
block 1
block 3
block 2
block 4
Crash
storage engine & 
execution engine
same process
Process crash requires network and/or disk I/O to re-read the
data
15
Spark Task
Spark Memory
block manager
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3
 block 4
Process crash only needs memory I/O to re-read the
data
Data Resilience during Crashes
storage engine & 
execution engine
same process
16
Crash
Process crash only needs memory I/O to re-read the data
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3
 block 4
Data Resilience during Crashes
storage engine & 
execution engine
same process
17
CASE STUDY
Thanks to Alluxio, we now have the raw
data immediately available at every
iteration and we can skip the costs of
loading in terms of time waiting,
network traffic, and RDBMS activity.

- Henry Powell, Barclays
RESULTS
•  Barclays workflow iteration time decreased
from hours to seconds

•  Alluxio enabled workflows that were
impossible before

•  By keeping data only in memory, the I/O cost of
loading and storing in Alluxio is now on the
order of seconds
Relational Database
Share Data Across Jobs at Memory
Speed
•  Reducing time from hours to
seconds
•  Solving issues of Teradata as
under storage
18
Barclays Previous Architecture
Spark
Teradata
Slow ETL,
30+ minutes
Each context restart
requires ETL process
again
h"ps://dzone.com/ar1cles/Accelerate-­‐In-­‐Memory-­‐Processing-­‐with-­‐Spark-­‐from-­‐Hours-­‐to-­‐Seconds-­‐With-­‐Tachyon	
  
19
Barclays Architecture with Alluxio
Spark
Teradata
Alluxio
ETL only once
Data still available in Alluxio
memory after Job crash
20
CASE 2:
Fast Data sharing between apps
Application 1
Alluxio
Application2
HDFS
21
Spark Job
Spark 
Memory
block 1
block 3
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine & 
execution engine
same process
Data Sharing Between Frameworks
Inter-process sharing slowed down by network and/or disk I/O
22
Data Sharing Between Frameworks
Spark Job
Spark Memory
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3
 block 4
storage engine & 
execution engine
same process
Inter-process sharing can happen at memory
speed
23
CASE STUDY
We’ve been running Alluxio in
production for over 9 months, resulting
in 15x speedup on average, and 300x
speedup at peak service times.

- Xueyan Li, Qunar
RESULTS

•  Multiple computing frameworks can share data
at memory speed with Alluxio

•  Improved the performance of their system with
15x – 300x speedups

•  Alluxio provides various user-friendly APIs,
which simplifies the transition to Alluxio.

Sharing Data Across Different
Compute Frameworks
•  6 billion logs (4.5 TB) daily
•  Mix of Memory + HDD
24
Qunar Previous Architecture
Flink
Streaming
Spark
Spark
Streaming
Flink
HDFS Cluster 
Some jobs cannot finish
Slow access to the storage
25
Qunar Architecture with Alluxio
Flink
Streaming
HDFS Cluster 
Spark
Spark
Streaming
Flink
Alluxio
Share in-memory data
across frameworks
26
CASE 3:
Accelerate access to remote storage
Application 
Alluxio
Remote
Storage
27
•  Different compute and storage hardware requirements
•  Scale compute and storage resources independently
•  Cost effective object stores (Amazon S3, Google GCS, Microsoft Azure
Blob Store) are inherently decoupled
Decoupling Compute from Storage
Accessing data
requires remote I/O!
28
Remote I/O problems
Application
Remote
Storage
every data operation requires
data transfer, sometimes over
the WAN
high latency, network
throughput
29
Visualizing the Stack with Alluxio
FAST 
104 - 105 MB/s
Application
Remote Storage
MODERATE 
103 - 104 MB/s
SLOW
102 - 103 MB/s
Alluxio MEM
Often
Only When Necessary

Alluxio SSD/HDD
Limited

30
What if memory capacity is still
not enough?
31
Alluxio Manages All Local Storage
MEM
SSD
HDD
Faster
Higher Capacity
32
Configurable Storage Tiers
MEM only
MEM + HDD
SSD only
33
Pluggable Tier Management Policies
Evict stale data to
slower tier
Promote hot data
to faster tier
34
CASE STUDY
Baidu File System
The performance was amazing. With
Spark SQL alone, it took 100-150 seconds
to finish a query; using Alluxio, where
data may hit local or remote Alluxio
nodes, it took 10-15 seconds.

- Shaoshan Liu, Baidu
RESULTS
•  Data queries are now 30x faster with Alluxio

•  1000-worker Alluxio cluster run stably,
providing over 50TB of RAM space

•  By using Alluxio, batch queries usually lasting
over 15 minutes were transformed into an
interactive query taking less than 30 seconds
Accelerate Access to Remote
Storage
•  1000+ nodes deployment
•  2+ petabytes of storage
•  Mix of memory + HDD
•  1000-worker cluster
35
CASE 4:
Unified Name Space
36	
  
Application 
Alluxio
AWS S3
 HDFS
Common Scenarios
•  At large organizations, data spans many storage
systems

•  Legacy storage systems
•  Application logic needs to integrate with different
storage systems

37
Unified Name Space
An abstraction for applications to access different storage
systems through the same interface

•  Benefits
–  Application written once with Allluxio API
–  Transparently add new storage systems
–  Seamlessly integrate with existing systems

38
Transparent Naming
Operations over persisted Alluxio objects mapped
transparently to underlying storage

alluxio://Data/Sales <-> hdfs://Data/Sales
Alluxio
 Storage System (HDFS, S3, …)
alluxio://host:port/
Data
 Users
Reports
 Sales
 Alice
 Bob
hdfs://host:port/
Data
 Users
Reports
 Sales
 Alice
 Bob
39
mount
Multiple Storage Systems
•  Unified namespace for multiple data sources
alluxio://Data/Sales -> s3://bucket/Sales
alluxio://Users/Alice -> hdfs://Users/Alice
•  API for on-the-fly mounting / unmounting

Alluxio
 Storage System A

alluxio://host:port/
Data
 Users
Alice
 Bob
hdfs://host:port/
Users
Alice
 Bob
Storage System B

s3://host/bucket
Reports
 Sales
Reports
 Sales
40
mount
mount
A Proliferation of Under Storage Systems
MORE…
41
CASE STUDY
RESULTS
•  Alluxio’s unified namespace enables different
applications and frameworks to easily interact
with their data from different storage systems

•  Global data centers across North America and
Asia

•  Tiered storage feature manages various
storage resources including memory, SSD and
disk
Transparently Manage Data
Across Different Storage Systems
•  Storage medium: MEM
•  Clusters in different regions
An Internet Company
Machine Learning Pipelines
42
42
Outline
•  What is Alluxio
•  Example use cases
1.  Off-heap memory to alleviate resource pressure
2.  Fast data sharing between jobs
3.  Accelerate access to remote storage
4.  Unified namespace
•  Summary

43
Takeaway: When to use Alluxio
•  Two or more jobs access the same dataset
•  Job(s) may not always succeed
•  Spark with memory/GC pressure
•  Jobs/Apps are pipelined
•  Resulting data does not need to be immediately
persisted
•  Interact with remote data
•  Data stored across different storage systems


44
Try out Alluxio 1.2.0
http://www.alluxio.org/releases
45
First China Meetups!
•  Nanjing (Saturday, 7/30)
•  Shanghai (Sunday, 7/31)


•  Beijing (Sunday, 8/7)
46
Resources
•  Alluxio Project: http://www.alluxio.org
•  Development: https://github.com/Alluxio/alluxio
•  Meet Friends: http://www.meetup.com/Alluxio
•  Alluxio, Inc.: http://www.alluxio.com
•  Training: http://www.alluxio.com/alluxio-training/
•  Contact us: info@alluxio.com
•  Alluxio Wechat: Alluxio
47
Alluxio Use Cases at Strata+Hadoop World Beijing 2016

More Related Content

What's hot

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Alluxio, Inc.
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio, Inc.
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio, Inc.
 
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio, Inc.
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Alluxio, Inc.
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...Alluxio, Inc.
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersAlluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Alluxio, Inc.
 
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon Nexus, Inc.
 
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Alluxio, Inc.
 
Accessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAccessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAlluxio, Inc.
 
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAccelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAlluxio, Inc.
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioAlluxio, Inc.
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio, Inc.
 

What's hot (20)

Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
 
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand Clusters
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Tachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage SystemTachyon: An Open Source Memory-Centric Distributed Storage System
Tachyon: An Open Source Memory-Centric Distributed Storage System
 
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for Dask
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
 
Accessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAccessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified Namespace
 
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAccelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with Alluxio
 
Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
 

Viewers also liked

Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Tachyon Nexus, Inc.
 
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015Tachyon Nexus, Inc.
 
Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Nexus, Inc.
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaAlluxio, Inc.
 
U2 product For Wiseeco
U2 product For WiseecoU2 product For Wiseeco
U2 product For Wiseeco호진 하
 
AWS의 하둡 관련 서비스 - EMR/S3
AWS의 하둡 관련 서비스 - EMR/S3AWS의 하둡 관련 서비스 - EMR/S3
AWS의 하둡 관련 서비스 - EMR/S3Keeyong Han
 

Viewers also liked (9)

Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
 
HDFS Tiered Storage
HDFS Tiered StorageHDFS Tiered Storage
HDFS Tiered Storage
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015Presentation by TachyonNexus & Intel at Strata Singapore 2015
Presentation by TachyonNexus & Intel at Strata Singapore 2015
 
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015Presentation by TachyonNexus & Baidu at Strata Singapore 2015
Presentation by TachyonNexus & Baidu at Strata Singapore 2015
 
Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)
 
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleGPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
U2 product For Wiseeco
U2 product For WiseecoU2 product For Wiseeco
U2 product For Wiseeco
 
AWS의 하둡 관련 서비스 - EMR/S3
AWS의 하둡 관련 서비스 - EMR/S3AWS의 하둡 관련 서비스 - EMR/S3
AWS의 하둡 관련 서비스 - EMR/S3
 

Similar to Alluxio Use Cases at Strata+Hadoop World Beijing 2016

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Data Con LA
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Spark Summit
 
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with AlluxioRunning Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxiothelabdude
 
Improving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioImproving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioAlluxio, Inc.
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudAlluxio, Inc.
 
Accelerate Cloud Training with Alluxio
Accelerate Cloud Training with AlluxioAccelerate Cloud Training with Alluxio
Accelerate Cloud Training with AlluxioAlluxio, Inc.
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Community
 
A Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemA Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemAlluxio, Inc.
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksLucidworks
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangSpark Summit
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsAlluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Speeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China UnicomSpeeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China UnicomAlluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Alluxio, Inc.
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio, Inc.
 

Similar to Alluxio Use Cases at Strata+Hadoop World Beijing 2016 (18)

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
 
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with AlluxioRunning Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxio
 
Improving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioImproving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using Alluxio
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
Accelerate Cloud Training with Alluxio
Accelerate Cloud Training with AlluxioAccelerate Cloud Training with Alluxio
Accelerate Cloud Training with Alluxio
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
A Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemA Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage System
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene Pang
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Speeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China UnicomSpeeding Up Spark Performance using Alluxio at China Unicom
Speeding Up Spark Performance using Alluxio at China Unicom
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 

More from Alluxio, Inc.

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio, Inc.
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio, Inc.
 
Alluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio, Inc.
 
Alluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio, Inc.
 

More from Alluxio, Inc. (20)

Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to Production
 
Alluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model TrainingAlluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio Webinar - Maximize GPU Utilization for Model Training
 
Alluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AIAlluxio Product school Webinar - Distributed Caching for Generative AI
Alluxio Product school Webinar - Distributed Caching for Generative AI
 

Recently uploaded

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - AvrilIvanti
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 

Recently uploaded (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Français Patch Tuesday - Avril
Français Patch Tuesday - AvrilFrançais Patch Tuesday - Avril
Français Patch Tuesday - Avril
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 

Alluxio Use Cases at Strata+Hadoop World Beijing 2016

  • 1. Memory-Speed Virtual Distributed Storage: Latest Features and Use Cases August 2016 @ Strata+Hadoop World Beijing Bin Fan, Yupeng Fu 1
  • 2. About Us •  Bin Fan •  Software Engineer @ Alluxio, Inc. •  Alluxio PMC •  Worked at Google, MSR •  CMU, CUHK, USTC •  Wechat @ apc999_fb •  Yupeng Fu •  Software Engineer @ Alluxio, Inc. •  Alluxio PMC •  Worked at Palantir, Google •  UCSD, Tsinghua •  Wechat @richbird9 2
  • 3. •  Team consists of Alluxio creators and top committers •  Invested by •  Committed to Alluxio Open Source •  http://www.alluxio.com Alluxio Inc. 3
  • 4. Outline •  What is Alluxio •  Example use cases 1.  Off-heap memory to alleviate resource pressure 2.  Fast data sharing between jobs 3.  Accelerate access to remote storage 4.  Unified namespace •  Summary 4
  • 5. Non-persistent data-storage software. ATTRIBUTES Memory-Speed Virtual Distributed Storage Scale out architecture. Virtualized across different storage types under a unified namespace. Memory-speed access to data. 5
  • 6. OPEN SOURCE ALLUXIO: History • Started in the AMPLab at UC Berkeley –  Summer of 2012 • Open Sourced as Tachyon (renamed to Alluxio) –  Apache 2.0 License –  Latest Release: version 1.2.0 (July, 2016) 6
  • 7. OPEN SOURCE ALLUXIO v0.2 v0.3 v0.4 v0.5 v0.6 v0.7 v0.8 v1.0 v1.1 7
  • 8. OPEN SOURCE ALLUXIO The fastest growing open-source project in the big data ecosystem. Currently over 300 contributors from over 100 organizations. 8
  • 9. Outline •  What is Alluxio •  Example use cases 1.  Off-heap memory to alleviate resource pressure 2.  Fast data sharing between jobs 3.  Accelerate access to remote storage 4.  Unified namespace •  Summary 9
  • 10. Application 1 Alluxio CASE 1: Off-heap Memory Storage to Alleviate Resource Pressure 10
  • 11. HDFS / Amazon S3 Consolidating Memory Spark Job1 Spark Memory block 1 block 3 Spark Job2 Spark Memory block 3 block 1 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data duplicated at memory-level 11
  • 12. Spark Job1 Spark mem Spark Job2 Spark mem HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Data not duplicated at memory-level Consolidating Memory storage engine & execution engine same process 12
  • 13. Data Resilience during Crashes Spark Task Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 Process crash requires network and/or disk I/O to re-read the data storage engine & execution engine same process 13
  • 14. Data Resilience during Crashes Crash Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 Process crash requires network and/or disk I/O to re-read the data storage engine & execution engine same process 14
  • 15. HDFS / Amazon S3 Data Resilience during Crashes block 1 block 3 block 2 block 4 Crash storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 15
  • 16. Spark Task Spark Memory block manager HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Process crash only needs memory I/O to re-read the data Data Resilience during Crashes storage engine & execution engine same process 16
  • 17. Crash Process crash only needs memory I/O to re-read the data HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Data Resilience during Crashes storage engine & execution engine same process 17
  • 18. CASE STUDY Thanks to Alluxio, we now have the raw data immediately available at every iteration and we can skip the costs of loading in terms of time waiting, network traffic, and RDBMS activity. - Henry Powell, Barclays RESULTS •  Barclays workflow iteration time decreased from hours to seconds •  Alluxio enabled workflows that were impossible before •  By keeping data only in memory, the I/O cost of loading and storing in Alluxio is now on the order of seconds Relational Database Share Data Across Jobs at Memory Speed •  Reducing time from hours to seconds •  Solving issues of Teradata as under storage 18
  • 19. Barclays Previous Architecture Spark Teradata Slow ETL, 30+ minutes Each context restart requires ETL process again h"ps://dzone.com/ar1cles/Accelerate-­‐In-­‐Memory-­‐Processing-­‐with-­‐Spark-­‐from-­‐Hours-­‐to-­‐Seconds-­‐With-­‐Tachyon   19
  • 20. Barclays Architecture with Alluxio Spark Teradata Alluxio ETL only once Data still available in Alluxio memory after Job crash 20
  • 21. CASE 2: Fast Data sharing between apps Application 1 Alluxio Application2 HDFS 21
  • 22. Spark Job Spark Memory block 1 block 3 Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data Sharing Between Frameworks Inter-process sharing slowed down by network and/or disk I/O 22
  • 23. Data Sharing Between Frameworks Spark Job Spark Memory Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 storage engine & execution engine same process Inter-process sharing can happen at memory speed 23
  • 24. CASE STUDY We’ve been running Alluxio in production for over 9 months, resulting in 15x speedup on average, and 300x speedup at peak service times. - Xueyan Li, Qunar RESULTS •  Multiple computing frameworks can share data at memory speed with Alluxio •  Improved the performance of their system with 15x – 300x speedups •  Alluxio provides various user-friendly APIs, which simplifies the transition to Alluxio. Sharing Data Across Different Compute Frameworks •  6 billion logs (4.5 TB) daily •  Mix of Memory + HDD 24
  • 25. Qunar Previous Architecture Flink Streaming Spark Spark Streaming Flink HDFS Cluster Some jobs cannot finish Slow access to the storage 25
  • 26. Qunar Architecture with Alluxio Flink Streaming HDFS Cluster Spark Spark Streaming Flink Alluxio Share in-memory data across frameworks 26
  • 27. CASE 3: Accelerate access to remote storage Application Alluxio Remote Storage 27
  • 28. •  Different compute and storage hardware requirements •  Scale compute and storage resources independently •  Cost effective object stores (Amazon S3, Google GCS, Microsoft Azure Blob Store) are inherently decoupled Decoupling Compute from Storage Accessing data requires remote I/O! 28
  • 29. Remote I/O problems Application Remote Storage every data operation requires data transfer, sometimes over the WAN high latency, network throughput 29
  • 30. Visualizing the Stack with Alluxio FAST 104 - 105 MB/s Application Remote Storage MODERATE 103 - 104 MB/s SLOW 102 - 103 MB/s Alluxio MEM Often Only When Necessary Alluxio SSD/HDD Limited 30
  • 31. What if memory capacity is still not enough? 31
  • 32. Alluxio Manages All Local Storage MEM SSD HDD Faster Higher Capacity 32
  • 33. Configurable Storage Tiers MEM only MEM + HDD SSD only 33
  • 34. Pluggable Tier Management Policies Evict stale data to slower tier Promote hot data to faster tier 34
  • 35. CASE STUDY Baidu File System The performance was amazing. With Spark SQL alone, it took 100-150 seconds to finish a query; using Alluxio, where data may hit local or remote Alluxio nodes, it took 10-15 seconds. - Shaoshan Liu, Baidu RESULTS •  Data queries are now 30x faster with Alluxio •  1000-worker Alluxio cluster run stably, providing over 50TB of RAM space •  By using Alluxio, batch queries usually lasting over 15 minutes were transformed into an interactive query taking less than 30 seconds Accelerate Access to Remote Storage •  1000+ nodes deployment •  2+ petabytes of storage •  Mix of memory + HDD •  1000-worker cluster 35
  • 36. CASE 4: Unified Name Space 36   Application Alluxio AWS S3 HDFS
  • 37. Common Scenarios •  At large organizations, data spans many storage systems •  Legacy storage systems •  Application logic needs to integrate with different storage systems 37
  • 38. Unified Name Space An abstraction for applications to access different storage systems through the same interface •  Benefits –  Application written once with Allluxio API –  Transparently add new storage systems –  Seamlessly integrate with existing systems 38
  • 39. Transparent Naming Operations over persisted Alluxio objects mapped transparently to underlying storage alluxio://Data/Sales <-> hdfs://Data/Sales Alluxio Storage System (HDFS, S3, …) alluxio://host:port/ Data Users Reports Sales Alice Bob hdfs://host:port/ Data Users Reports Sales Alice Bob 39 mount
  • 40. Multiple Storage Systems •  Unified namespace for multiple data sources alluxio://Data/Sales -> s3://bucket/Sales alluxio://Users/Alice -> hdfs://Users/Alice •  API for on-the-fly mounting / unmounting Alluxio Storage System A alluxio://host:port/ Data Users Alice Bob hdfs://host:port/ Users Alice Bob Storage System B s3://host/bucket Reports Sales Reports Sales 40 mount mount
  • 41. A Proliferation of Under Storage Systems MORE… 41
  • 42. CASE STUDY RESULTS •  Alluxio’s unified namespace enables different applications and frameworks to easily interact with their data from different storage systems •  Global data centers across North America and Asia •  Tiered storage feature manages various storage resources including memory, SSD and disk Transparently Manage Data Across Different Storage Systems •  Storage medium: MEM •  Clusters in different regions An Internet Company Machine Learning Pipelines 42 42
  • 43. Outline •  What is Alluxio •  Example use cases 1.  Off-heap memory to alleviate resource pressure 2.  Fast data sharing between jobs 3.  Accelerate access to remote storage 4.  Unified namespace •  Summary 43
  • 44. Takeaway: When to use Alluxio •  Two or more jobs access the same dataset •  Job(s) may not always succeed •  Spark with memory/GC pressure •  Jobs/Apps are pipelined •  Resulting data does not need to be immediately persisted •  Interact with remote data •  Data stored across different storage systems 44
  • 45. Try out Alluxio 1.2.0 http://www.alluxio.org/releases 45
  • 46. First China Meetups! •  Nanjing (Saturday, 7/30) •  Shanghai (Sunday, 7/31) •  Beijing (Sunday, 8/7) 46
  • 47. Resources •  Alluxio Project: http://www.alluxio.org •  Development: https://github.com/Alluxio/alluxio •  Meet Friends: http://www.meetup.com/Alluxio •  Alluxio, Inc.: http://www.alluxio.com •  Training: http://www.alluxio.com/alluxio-training/ •  Contact us: info@alluxio.com •  Alluxio Wechat: Alluxio 47