SlideShare a Scribd company logo
1 of 43
Alluxio (formerly Tachyon):
Unified Namespace and Tiered Storage
Calvin Jia, Jiri Simsa
One of the Things to Watch at
Strata
TechCrunch article:
“… An interesting item that made the top
terms list is “alluxio,” which is the recently
renamed Tachyon project. Alluxio is a virtual
distributed storage system, and it has a
memory-centric architecture that enables
data sharing across clusters at memory
speed. … “
2
Who Are We?
• Calvin Jia
• SWE @ Alluxio, Inc.
• #1 Alluxio contributor
• Twitter: @JiaCalvin
• Jiri Simsa
• SWE @ Alluxio, Inc
• CMU Ph.D. & Google
• Twitter: @jsimsa
3
Alluxio Inc.
• Founded by Alluxio creators and top
committers
• Formerly Tachyon Nexus, Inc.
• $7.5 million Series A by Andreessen Horowitz
• Committed to the Alluxio Open Source
Project
• Company Website: http://www.alluxio.com
4
Outline
• Alluxio Introduction
• Tiered Storage
• Unified Namespace
5
ALLUXIO:
Open Source Memory Speed
Virtual Distributed Storage
6
Memory Speed
• Memory-centric architecture designed for memory I/O
Virtual
• Abstracts persistent storage from applications
Distributed
• Designed to scale with nothing but commodity hardware
Open Source
• One of the fastest growing project communities
7
Contributor Growth
• Over 200 Contributors
– 3x growth over the last year
8
Organizations
• Over 50 Organizations
9
Alluxio Ecosystem
10
Memory is Getting Faster
11
Memory is Getting Cheaper
12
Simple Examples
• Data sharing between frameworks
• Data resilience during application crashes
• Consolidate memory usage and alleviate
GC issues
13
Spark Job
Spark
Memory
block 1
block 3
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
Data Sharing Between Frameworks
Inter-process sharing slowed down by network and/or disk I/O
14
Data Sharing Between Frameworks
Spark Job
Spark Memory
Hadoop MR Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3 block 4
storage engine &
execution engine
same process
Inter-process sharing can happen at memory speed
15
Data Resilience during Crashes
Spark Task
Spark Memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
Process crash requires network and/or disk I/O to re-read the data
16
Data Resilience during Crashes
Crash
Spark Memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
Process crash requires network and/or disk I/O to re-read the data
17
HDFS / Amazon S3
Data Resilience during Crashes
block 1
block 3
block 2
block 4
Crash
storage engine &
execution engine
same process
Process crash requires network and/or disk I/O to re-read the data
18
Data Resilience during Crashes
Spark Task
Spark Memory
block manager
storage engine &
execution engine
same process
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3 block 4
Process crash only needs memory I/O to re-read the data
19
Data Resilience during Crashes
Crash
storage engine &
execution engine
same process
Process crash only needs memory I/O to re-read the data
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3 block 4
20
HDFS / Amazon S3
Consolidating Memory
Spark Job1
Spark
Memory
block 1
block 3
Spark Job2
Spark
Memory
block 3
block 1
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
Data duplicated at memory-level
21
Consolidating Memory
Spark Job1
Spark mem
Spark Job2
Spark mem
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
HDFS
disk
block 1
block 3
block 2
block 4
Alluxio
In-Memory
block 1
block 3 block 4
Data not duplicated at memory-level
22
Case Study: Barclays
Making the Impossible Possible with Tachyon: Accelerate Spark
Jobs from Hours to Seconds
• Application: SparkSQL + Spark RDDs
• Alluxio Storage Layer: MEM
• Backend Storage: None
• Result: Speeding up Spark jobs from hours to seconds
23
Common Questions
– Memory speed sharing among distributed applications
HDFS interface compatible
– GC overhead introduced by in-memory caching
Off-Heap Memory Management
– Data set could be larger than available memory
Tiered storage
24
Outline
• Alluxio Introduction
• Tiered Storage
• Unified Namespace
25
Motivation
• Memory resources are still constrained
• Alluxio data management logic is not
limited to memory
• Storage resources available on compute
clusters
26
Tiered Storage
MEM
SSD
HDD
27
Tiered Storage
• Extends Alluxio with support for SSDs and/or
HDDs storage
• Different tiers have different characteristics
– Keep hot data in fast but limited storage
– Keep warm data in slower but abundant storage
• Workers manage their own storage
• Data allocation and eviction is driven by
application access
28
Tiered Storage Architecture
Machine Type 1
Compute Client
Alluxio Master
Memory, SSD, HDD
Machine Type 2
Compute Client
Alluxio Worker
Memory, SSD, HDD
29
Tiered Storage Architecture
Machine Type 2
Compute Client
• Alluxio Client
Alluxio Worker
• Tiered Block Store
• Evictor
• Allocator
Memory, SSD, HDD
30
Automatic Data Migration
• Data can be evicted to lower layers if it is “cooling down”
• Data can be promoted to upper layers if it is “warming
up”
Evict stale data to
lower tier
Promote hot data to
upper tier
31
Pluggable Policies
• Policies can be customized to suit
workloads
• Defaults provided for general scenarios
• Advanced users can optimize with
additional knowledge
– For example: Optimize for iterations
32
Case Study: Baidu
Baidu Queries Data 30 Times Faster with Alluxio
• Application: Spark
• Alluxio Storage: MEM + HDD
• Backend Storage: Baidu’s File System
• 200+ nodes deployment, 2PB+ managed space
• Result: Speeding up data querying by 30x
33
Outline
• About Alluxio
• Tiered Storage
• Unified Namespace
34
Big Data Ecosystem
35
Big Data Ecosystem
36
Big Data Ecosystem
37
Motivation
• At large organizations, data spans many storage
systems (object storage, network / distributed file
systems, DBs)
• Application logic needs to integrate with different types
of storage systems
• Data needs to be moved around to work around
application limitations
• In-house storage layers are built to address limitations
of legacy storage systems
38
Transparent Naming
• Applications can transparently and efficiently interact
with remote storage through Alluxio.
• Applications do not need to use different APIs for
interacting with different storage systems.
alluxio://host:port/
data users
reports sales alice bob
s3n://bucket/directory
data users
reports sales alice bob
Alluxio Storage System
39
Single Namespace
• Applications can read and write different storage
systems.
• Decouples data location from application
alluxio://host:port/
data users
reports sales alice bob
hdfs://host:port/
users
alice bob
s3n://bucket/directory
reports sales
Alluxio Storage System A
Storage System B
40
Architecture
Alluxio Interface
UFS Interface
HDFSS3 Swift …
S3
adapter
Swift
adapter
HDFS
adapter ALLUXIO
41
Alluxio Benefits
42
• Enable new workloads across storage systems
• Work with the framework of your choice
• Scale storage and compute independently
Resources
• Alluxio Project: http://www.alluxio.org
• Development: https://github.com/Alluxio/alluxio
• Meet Friends: http://www.meetup.com/Alluxio
• Alluxio Inc: http://www.alluxio.com
• Contact us: info@alluxio.com
43

More Related Content

What's hot

Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio, Inc.
 
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road AheadAlluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road AheadAlluxio, Inc.
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...Alluxio, Inc.
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio, Inc.
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit
 
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Alluxio, Inc.
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Alluxio, Inc.
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Alluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio, Inc.
 
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio, Inc.
 
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio, Inc.
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkAlluxio, Inc.
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersAlluxio, Inc.
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Alluxio, Inc.
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioAlluxio, Inc.
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioAlluxio, Inc.
 

What's hot (20)

Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19Tachyon workshop 2015-07-19
Tachyon workshop 2015-07-19
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
 
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road AheadAlluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
Unify Data at Memory Speed by Haoyuan Li - VAULT Conference 2017
 
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
Introduction to Alluxio (formerly Tachyon) and how it brings up to 300x perfo...
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
 
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
 
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for Dask
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
The Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand ClustersThe Missing Piece of On-Demand Clusters
The Missing Piece of On-Demand Clusters
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with AlluxioSecurely Enhancing Data Access in Hybrid Cloud with Alluxio
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
 

Viewers also liked

232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현NAVER D2
 
Play node conference
Play node conferencePlay node conference
Play node conferenceJohn Kim
 
NODE.JS 글로벌 기업 적용 사례 그리고, real-time 어플리케이션 개발하기
NODE.JS 글로벌 기업 적용 사례  그리고, real-time 어플리케이션 개발하기NODE.JS 글로벌 기업 적용 사례  그리고, real-time 어플리케이션 개발하기
NODE.JS 글로벌 기업 적용 사례 그리고, real-time 어플리케이션 개발하기John Kim
 
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015Goonoo Kim
 
Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존동수 장
 

Viewers also liked (6)

232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현232 deview2013 oss를활용한분산아키텍처구현
232 deview2013 oss를활용한분산아키텍처구현
 
Play node conference
Play node conferencePlay node conference
Play node conference
 
NODE.JS 글로벌 기업 적용 사례 그리고, real-time 어플리케이션 개발하기
NODE.JS 글로벌 기업 적용 사례  그리고, real-time 어플리케이션 개발하기NODE.JS 글로벌 기업 적용 사례  그리고, real-time 어플리케이션 개발하기
NODE.JS 글로벌 기업 적용 사례 그리고, real-time 어플리케이션 개발하기
 
Node.js in Flitto
Node.js in FlittoNode.js in Flitto
Node.js in Flitto
 
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
시간당 수백만 요청을 처리하는 node.js 서버 운영기 - Playnode 2015
 
Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존Java/Spring과 Node.js의공존
Java/Spring과 Node.js의공존
 

Similar to Alluxio Presentation at Strata San Jose 2016

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Data Con LA
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaAlluxio, Inc.
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudAlluxio, Inc.
 
A Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemA Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemAlluxio, Inc.
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio, Inc.
 
Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraAlluxio, Inc.
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Haoyuan Li
 
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with AlluxioRunning Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxiothelabdude
 
Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Nexus, Inc.
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Spark Summit
 
Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Cloudian
 
Improving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioImproving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioAlluxio, Inc.
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksLucidworks
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangSpark Summit
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangSpark Summit
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOStorage Switzerland
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio, Inc.
 
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAccelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAlluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 

Similar to Alluxio Presentation at Strata San Jose 2016 (20)

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Alluxio (formerly Tachyon)...
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Unified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any CloudUnified Big Data Analytics: Any Stack, Any Cloud
Unified Big Data Analytics: Any Stack, Any Cloud
 
A Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage SystemA Reliable Memory-Centric Distributed Storage System
A Reliable Memory-Centric Distributed Storage System
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native Era
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5Tachyon-2014-11-21-amp-camp5
Tachyon-2014-11-21-amp-camp5
 
Running Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with AlluxioRunning Solr in the Cloud at Memory Speed with Alluxio
Running Solr in the Cloud at Memory Speed with Alluxio
 
Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)Tachyon Presentation at AMPCamp 6 (November, 2015)
Tachyon Presentation at AMPCamp 6 (November, 2015)
 
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
Effective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan...
 
Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution Building a Hybrid Cloud Solution
Building a Hybrid Cloud Solution
 
Improving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using AlluxioImproving Memory Utilization of Spark Jobs Using Alluxio
Improving Memory Utilization of Spark Jobs Using Alluxio
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene Pang
 
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCOCloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
Cloudian Webinar - 7 Key Reasons why Object Storage lowers Storage TCO
 
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloads
 
Accelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with AlluxioAccelerating Spark Workloads in a Mesos Environment with Alluxio
Accelerating Spark Workloads in a Mesos Environment with Alluxio
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 

Recently uploaded

Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 

Recently uploaded (11)

Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 

Alluxio Presentation at Strata San Jose 2016

  • 1. Alluxio (formerly Tachyon): Unified Namespace and Tiered Storage Calvin Jia, Jiri Simsa
  • 2. One of the Things to Watch at Strata TechCrunch article: “… An interesting item that made the top terms list is “alluxio,” which is the recently renamed Tachyon project. Alluxio is a virtual distributed storage system, and it has a memory-centric architecture that enables data sharing across clusters at memory speed. … “ 2
  • 3. Who Are We? • Calvin Jia • SWE @ Alluxio, Inc. • #1 Alluxio contributor • Twitter: @JiaCalvin • Jiri Simsa • SWE @ Alluxio, Inc • CMU Ph.D. & Google • Twitter: @jsimsa 3
  • 4. Alluxio Inc. • Founded by Alluxio creators and top committers • Formerly Tachyon Nexus, Inc. • $7.5 million Series A by Andreessen Horowitz • Committed to the Alluxio Open Source Project • Company Website: http://www.alluxio.com 4
  • 5. Outline • Alluxio Introduction • Tiered Storage • Unified Namespace 5
  • 6. ALLUXIO: Open Source Memory Speed Virtual Distributed Storage 6
  • 7. Memory Speed • Memory-centric architecture designed for memory I/O Virtual • Abstracts persistent storage from applications Distributed • Designed to scale with nothing but commodity hardware Open Source • One of the fastest growing project communities 7
  • 8. Contributor Growth • Over 200 Contributors – 3x growth over the last year 8
  • 9. Organizations • Over 50 Organizations 9
  • 11. Memory is Getting Faster 11
  • 12. Memory is Getting Cheaper 12
  • 13. Simple Examples • Data sharing between frameworks • Data resilience during application crashes • Consolidate memory usage and alleviate GC issues 13
  • 14. Spark Job Spark Memory block 1 block 3 Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data Sharing Between Frameworks Inter-process sharing slowed down by network and/or disk I/O 14
  • 15. Data Sharing Between Frameworks Spark Job Spark Memory Hadoop MR Job YARN HDFS / Amazon S3 block 1 block 3 block 2 block 4 HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 storage engine & execution engine same process Inter-process sharing can happen at memory speed 15
  • 16. Data Resilience during Crashes Spark Task Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 16
  • 17. Data Resilience during Crashes Crash Spark Memory block manager block 1 block 3 HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 17
  • 18. HDFS / Amazon S3 Data Resilience during Crashes block 1 block 3 block 2 block 4 Crash storage engine & execution engine same process Process crash requires network and/or disk I/O to re-read the data 18
  • 19. Data Resilience during Crashes Spark Task Spark Memory block manager storage engine & execution engine same process HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Process crash only needs memory I/O to re-read the data 19
  • 20. Data Resilience during Crashes Crash storage engine & execution engine same process Process crash only needs memory I/O to re-read the data HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 20
  • 21. HDFS / Amazon S3 Consolidating Memory Spark Job1 Spark Memory block 1 block 3 Spark Job2 Spark Memory block 3 block 1 block 1 block 3 block 2 block 4 storage engine & execution engine same process Data duplicated at memory-level 21
  • 22. Consolidating Memory Spark Job1 Spark mem Spark Job2 Spark mem HDFS / Amazon S3 block 1 block 3 block 2 block 4 storage engine & execution engine same process HDFS disk block 1 block 3 block 2 block 4 Alluxio In-Memory block 1 block 3 block 4 Data not duplicated at memory-level 22
  • 23. Case Study: Barclays Making the Impossible Possible with Tachyon: Accelerate Spark Jobs from Hours to Seconds • Application: SparkSQL + Spark RDDs • Alluxio Storage Layer: MEM • Backend Storage: None • Result: Speeding up Spark jobs from hours to seconds 23
  • 24. Common Questions – Memory speed sharing among distributed applications HDFS interface compatible – GC overhead introduced by in-memory caching Off-Heap Memory Management – Data set could be larger than available memory Tiered storage 24
  • 25. Outline • Alluxio Introduction • Tiered Storage • Unified Namespace 25
  • 26. Motivation • Memory resources are still constrained • Alluxio data management logic is not limited to memory • Storage resources available on compute clusters 26
  • 28. Tiered Storage • Extends Alluxio with support for SSDs and/or HDDs storage • Different tiers have different characteristics – Keep hot data in fast but limited storage – Keep warm data in slower but abundant storage • Workers manage their own storage • Data allocation and eviction is driven by application access 28
  • 29. Tiered Storage Architecture Machine Type 1 Compute Client Alluxio Master Memory, SSD, HDD Machine Type 2 Compute Client Alluxio Worker Memory, SSD, HDD 29
  • 30. Tiered Storage Architecture Machine Type 2 Compute Client • Alluxio Client Alluxio Worker • Tiered Block Store • Evictor • Allocator Memory, SSD, HDD 30
  • 31. Automatic Data Migration • Data can be evicted to lower layers if it is “cooling down” • Data can be promoted to upper layers if it is “warming up” Evict stale data to lower tier Promote hot data to upper tier 31
  • 32. Pluggable Policies • Policies can be customized to suit workloads • Defaults provided for general scenarios • Advanced users can optimize with additional knowledge – For example: Optimize for iterations 32
  • 33. Case Study: Baidu Baidu Queries Data 30 Times Faster with Alluxio • Application: Spark • Alluxio Storage: MEM + HDD • Backend Storage: Baidu’s File System • 200+ nodes deployment, 2PB+ managed space • Result: Speeding up data querying by 30x 33
  • 34. Outline • About Alluxio • Tiered Storage • Unified Namespace 34
  • 38. Motivation • At large organizations, data spans many storage systems (object storage, network / distributed file systems, DBs) • Application logic needs to integrate with different types of storage systems • Data needs to be moved around to work around application limitations • In-house storage layers are built to address limitations of legacy storage systems 38
  • 39. Transparent Naming • Applications can transparently and efficiently interact with remote storage through Alluxio. • Applications do not need to use different APIs for interacting with different storage systems. alluxio://host:port/ data users reports sales alice bob s3n://bucket/directory data users reports sales alice bob Alluxio Storage System 39
  • 40. Single Namespace • Applications can read and write different storage systems. • Decouples data location from application alluxio://host:port/ data users reports sales alice bob hdfs://host:port/ users alice bob s3n://bucket/directory reports sales Alluxio Storage System A Storage System B 40
  • 41. Architecture Alluxio Interface UFS Interface HDFSS3 Swift … S3 adapter Swift adapter HDFS adapter ALLUXIO 41
  • 42. Alluxio Benefits 42 • Enable new workloads across storage systems • Work with the framework of your choice • Scale storage and compute independently
  • 43. Resources • Alluxio Project: http://www.alluxio.org • Development: https://github.com/Alluxio/alluxio • Meet Friends: http://www.meetup.com/Alluxio • Alluxio Inc: http://www.alluxio.com • Contact us: info@alluxio.com 43

Editor's Notes

  1. Good afternoon everyone, and welcome to the Alluxio features talk. We will give an introduction to Alluxio and specifically go over two fundamental features in Alluxio. By the end of the talk, you should have a good idea as to why we believe Alluxio is qualified as a “Data Innovation”. First, could I get a show of hands of who’s already attended the Alluxio talk early today? Great, you guys will have a lot more insight if you watch the recording of that talk after this one.
  2. I want to start off by introducing us. I’m Calvin, the top contributor to the project I’ve been working on the Alluxio project for a little over 3 years now. I’m currently a software engineer at Alluxio, Inc. Joining me for this talk is my colleague, Jiri. He’s also a software engineer at Alluxio, Inc and has experience working at Google as well as a PhD from CMU. Both of our twitter accounts are here if you want to follow us for the latest news about the project.
  3. I mentioned we are both working at Alluxio Inc, which is a company dedicated to growing and building the Alluxio open source project. We were formerly known as Tachyon Nexus and are backed by A16Z. If you are interested in learning more about us, our company site is alluxio.com. And of course, if we’ve impressed you enough and you want to work with us, we are hiring!
  4. Now let’s dive into the talk. There will be three sections, the first of which is an introduction to Alluxio.
  5. Alluxio – Open Source Memory Speed Virtual Distributed Storage. That’s a lot of adjectives, is probably the first thing you thought. The second might be, Hey that sounds really familiar, isn’t that Tachyon? Much like how the company was originally Tachyon Nexus, the Tachyon project has recently become Alluxio with the 1.0 release.
  6. More importantly, you are probably wondering what all those adjectives meant. Let’s start with Open Source, this means the system’s source code is available for anyone to download, look at, or contribute to. We have a large community working together on the project and are growing at a rapid pace. Memory speed is referring to the system architecture designed to take advantage of the growing amounts of memory in machines. Virtual describes the abstraction Alluxio provides to storage systems and applications, essentially allowing the two layers to be separate from each other. And finally distributed refers to the fact Alluxio can scale to many machines as long as you have more commodity hardware to throw at it.
  7. Here is a more visual representation of where Alluxio sits and its function in the big data stack. Above Alluxio are many compute frameworks, such as map reduce and spark. These are connected by Alluxio to various storage systems which may not necessarily need to be file systems. However, Alluxio is more than just a connection layer, it provides great benefits by acting as a storage system which provides a view of all your data but only holds what is necessary.
  8. The previous diagram implied that Alluxio is something new in the stack, not a replacement for anything. Why would a new layer emerge or be useful, don’t applications just directly communicate with storage? To see the answer to this question, we need to take a look at technologies trends, in particular, memory. Memory is awesome, its super performant and allows workloads to run at blazing speeds. In the past decade, we’ve seen a exponential growth in RAM throughput and steadily declining costs. Ands its not just Alluxio which has realized this direction, many compute frameworks have embraced the idea of being memory centric to achieve impressive results.
  9. The add to the exponential throughput improvements of memory I/O are the steadily declining costs. The two factors generate the perfect situation for commodity technologies to be seriously designed with memory in mind.
  10. I’ll go through some simple examples of the high level point I mentioned. Data sharing between frameworks, Data resilience during application crashes, and Consolidate memory usage and reduce GC
  11. Video recommendation system similarity Top list