SlideShare a Scribd company logo
1 of 22
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
Simplifying and Accelerating Data Access
for AI/ML Model Training
Kevin Petrie
Vice President of Research
Sridhar Venkatesh
SVP of Product
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
The Rise of Generative AI
After lots of training, language models generate strings of words
that become logical sentences and paragraphs
• A neural network whose nodes
share inputs and outputs
• Learns, summarizes, and
generates content
• Creates textual answers to
natural language questions
Source: The Economist
WHAT IS A LANGUAGE MODEL?
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
The Risk: Robots Get Things Wrong Too
Data teams must inspect, validate, and govern language model outputs
RISKS OF LANGUAGE MODELS
DATA QUALITY
Inaccuracies due to inaccurate/insufficient inputs, lack of context
EXPLAINABILITY
Vague/unknown sources or reasoning
PRIVACY
Exposure or theft due to user tracking
INTELLECTUAL PROPERTY
Liability for mishandled trademarks, copyrights, etc.
FAIRNESS
Perpetuation of bias in training data
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
Time to Get Domain Specific
Domain-specific, “small” language models reduce risk and boost productivity
by providing more governed and specialized outputs
• Enriched, detailed user
prompts
• Fine-tuned training on
enterprise data
• Augmented outputs; e.g.,
from multiple models
Small
Language
Model (SLM)
Large
Language
Model (LLM)
More
Governed
Generic Specialty
Less
Governed
ENTER THE SMALL
LANGUAGE MODEL
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
Small Language Models Will Drive the GenAI Boom
30% of data practitioners are building or training their own language models now. 20% more plan to do so*
*Source: Active LinkedIn survey of 55 respondents to date
“We believe in a world where
everyone is empowered to build
and train their own models,
imbued with their own opinions
and viewpoints.”
- Naveen Rao, Co-Founder and CEO,
MosaicML
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
“We hold these truths to be self-evident…”
TEXT
TOKENS We hold these …
VECTORS
truths to
[.45, 6.3, .99]
[7.6, .04, 19] [84, .13, 1.6]
VECTOR DB [.45, 6.3, .99] [7.6, .04, 19] [84, .13, 1.6]
LANGUAGE
MODEL
QUERY ONE QUERY TWO
1
2
3
4
5
Data teams must design and build new pipelines to feed their domain-specific data into language models
Data Processing for Language Models
Assemble unstructured text from various files
Convert words and punctuation marks to tokens
Use embeddings to convert tokens into numerical
vectors that describe their semantics
Load, organize, and index these vectors in a vector
database
Use a language model to search and query the
vectors while responding to real-time user prompts
NEW DATA PIPELINE
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
ON PREMISES | HYBRID | CLOUD | MULTI CLOUD
GENERATIVE AI EMBEDDED
CATALOG GOVERN
OBSERVE
INTEGRATE MASTER
STRUCTURED DATA
(DB TABLES)
SEMI STRUCTURED
(LOGS, CLICKSTREAMS, SENSORS…)
UNSTRUCTURED
(TEXT, IMAGES…)
CATALOG
INTEGRATE MASTER
ANALYTICS OPERATIONS
As companies embed generative AI into their workflows, they must manage
and process multi-structured data in a more holistic and efficient manner
The New Generative AI Data Stack
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
AI/ML Initiatives Need Fast and Simple Data Access
AI/ML initiatives require companies to balance, optimize, and secure workloads
across distributed datasets and compute resources
• Data access. View and process data wherever
it resides
• Performance. Retrieve data with low
latency/high throughput
• Portability. Run applications wherever suitable
compute resides
• Cost visibility. Monitor and control compute
cycles
• Multi tenancy. Isolate application compute to
safeguard performance
• Security. Restrict data access to minimize risk
of breaches
REQUIREMENTS
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
The Data Access Layer: Architecture
The data access layer continuously adjusts workloads, storage, and compute
• Namespace. Unified interface for
all data access
• APIs. Dynamic communication
between applications and storage
• Caching. Tier data by priority:
memory, SSDs, object store
• Metadata. Centralize descriptions
of data objects and resources
• Security. Authenticate users,
authorize access, log actions
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
Use Cases
AI initiatives have a range of use cases that require fast and simple data access
DATA CENTER CLOUD 1 CLOUD 2
ANALYTICS & AI IN A HYBRID ENVIRONMENT
ANALYTICS & AI ACROSS CLOUDS
WORKLOAD BURSTS
PROJECT EXPANSIONS
MIGRATIONS
COST OPTIMIZATION
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
Guiding Principles
Data leaders should evaluate where and how to implement a data access layer
to support generative AI initiatives
FIND THE BOTTLENECK
DECIDE WHETHER TO BUILD OR BUY
PLAN FOR GROWTH
© Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com
Questions?
I’m listening!
13
Retooling the enterprise data infrastructure
Legacy data centers can’t keep up
High Performance
Computing
Specialized
Hardware
Varied
Workloads
We're seeing incredible orders to retool
the world's data centers… a 10-year
transition to basically recycle or reclaim
the world's data centers and build it out
as accelerated computing.
Jensen Huang
Nvidia CEO
“
Challenges as you try to scale
14
GPUs are this year’s
toilet paper.
Wall Street Journal
“
GPUs are
scarce
GPUs are
expensive
Low GPU
Utilization
Business Pressures Complex & Costly Solutions
GPUs are
scarce
GPUs are
expensive
Low GPU
Utilization
Faster model
development times
Increased
freshness
Higher accuracy
and traceability
Rapidly growing
datasets
Extensive data engineering
managing data copies
Specialized storage
Out of control cloud and
infra costs
15
16
Alluxio Data Platform
High Performance data access, unified global view
1.Faster Time-to-Market
50%
Hundreds of thousands of dollars saved annually
compared to previous deployment.
2-3X Model Training Performance Cost Reduction, Performance Boost
International B2C with a multi-cloud, cross-region AI platform, serving LLMs and training
models from object storage. They optimized their AI platform with Alluxio to speed data
delivery to training clusters and facilitate faster model deployment in latency sensitive
production use cases.
Models Deployed in Minutes vs Days
Faster model deployment times
2. Higher GPU Utilization
“In a cloud environment, where GPU hardware is paid for as a function of time, you need
fast, performant, reliable, and cost effective data for your model training pipelines to keep
your GPU utilization close to 99%.”
20-30%
Average reported GPU utilization
based on direct access from remote
storage
GPU Utilization accessing commodity storage
GPU Utilization accessing Alluxio
Alluxio serves high throughput data to K8s training
workloads.
90
%
GPU utilization from Alluxio serving
data pulled from object storage. In
increase from 50% utilization via s3fs-
fuse.
3. Reduction in Personnel
Increase in Productivity
Pre-Processed
Data
Data
Management
Pre-
Processed
Data
Training
Clusters
Data scientists
send requests to
AI platform
teams. Platform
teams set up
individual data
pipelines.
With Alluxio, data
scientists just
access their data.
Alluxio
consolidates many
pipelines into an
access layer.
Pipeline or
Scheduler
Training
Clusters
20
4. Reduction in Infrastructure Spend
Alluxio optimizes data platforms to increase efficiency
Data Engineering
Pipelines
Data workflows improved by on-
demand access from Alluxio cache
S3 Egress and API
Fees
Fees significantly reduced via
granular caching and data
reuse
High Performance
Computing
Replaceable with low-cost hardware
at comparable performance
Reduced or Eliminated
Network Congestion Network congestion reduced by
serving files locally
5. Cloud Vendor Leverage
Multi-cloud strategies with cost-effective benefits
Respond to Limited GPU Availability
Demand for GPUs has exploded
Organizations use Alluxio to supply high performance data access
to remote GPU clusters wherever they find capacity.
Increase Cloud Agility
Competing CSPs may provide attractive discounts
Alluxio empowers organizations to capitalize on hardware discounts
or cost-effective storage in real-time. Users access data wherever it
resides.
Avoid Vendor Lock-In
Negotiate with CSPs from a stronger position
Single cloud deployments are convenient, but that may become an
obstacle in negotiations. Alluxio facilitates hybrid and multi-cloud.
Twitter.com/alluxio
Linkedin.com/alluxio
Website
www.alluxio.io
Slack
http://slackin.alluxio.io/
@
Social Media
Q&A

More Related Content

Similar to Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/ML Model Training

IRJET - Secure Data Sharing in Cloud Computing using Revocable Storage Id...
IRJET -  	  Secure Data Sharing in Cloud Computing using Revocable Storage Id...IRJET -  	  Secure Data Sharing in Cloud Computing using Revocable Storage Id...
IRJET - Secure Data Sharing in Cloud Computing using Revocable Storage Id...IRJET Journal
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagedbpublications
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeDenodo
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
 
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
Insurtech, Cloud and Cybersecurity -  Chartered Insurance InstituteInsurtech, Cloud and Cybersecurity -  Chartered Insurance Institute
Insurtech, Cloud and Cybersecurity - Chartered Insurance InstituteHenrique Centieiro
 
THE SURVEY ON REFERENCE MODEL FOR OPEN STORAGE SYSTEMS INTERCONNECTION MASS S...
THE SURVEY ON REFERENCE MODEL FOR OPEN STORAGE SYSTEMS INTERCONNECTION MASS S...THE SURVEY ON REFERENCE MODEL FOR OPEN STORAGE SYSTEMS INTERCONNECTION MASS S...
THE SURVEY ON REFERENCE MODEL FOR OPEN STORAGE SYSTEMS INTERCONNECTION MASS S...IRJET Journal
 
IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
 IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
IRJET - Efficient and Verifiable Queries over Encrypted Data in CloudIRJET Journal
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleRobb Boyd
 
Cloudera federal summit
Cloudera federal summitCloudera federal summit
Cloudera federal summitMatt Carroll
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOSQAware GmbH
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloudredmondpulver
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxGautamPopli1
 
BEST FINAL YEAR PROJECT IEEE 2015 BY SPECTRUM SOLUTIONS PONDICHERRY
BEST FINAL YEAR PROJECT IEEE 2015 BY SPECTRUM SOLUTIONS PONDICHERRYBEST FINAL YEAR PROJECT IEEE 2015 BY SPECTRUM SOLUTIONS PONDICHERRY
BEST FINAL YEAR PROJECT IEEE 2015 BY SPECTRUM SOLUTIONS PONDICHERRYRaushan Kumar Singh
 

Similar to Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/ML Model Training (20)

IRJET - Secure Data Sharing in Cloud Computing using Revocable Storage Id...
IRJET -  	  Secure Data Sharing in Cloud Computing using Revocable Storage Id...IRJET -  	  Secure Data Sharing in Cloud Computing using Revocable Storage Id...
IRJET - Secure Data Sharing in Cloud Computing using Revocable Storage Id...
 
Privacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storagePrivacy preserving public auditing for secured cloud storage
Privacy preserving public auditing for secured cloud storage
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
Insurtech, Cloud and Cybersecurity -  Chartered Insurance InstituteInsurtech, Cloud and Cybersecurity -  Chartered Insurance Institute
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
 
THE SURVEY ON REFERENCE MODEL FOR OPEN STORAGE SYSTEMS INTERCONNECTION MASS S...
THE SURVEY ON REFERENCE MODEL FOR OPEN STORAGE SYSTEMS INTERCONNECTION MASS S...THE SURVEY ON REFERENCE MODEL FOR OPEN STORAGE SYSTEMS INTERCONNECTION MASS S...
THE SURVEY ON REFERENCE MODEL FOR OPEN STORAGE SYSTEMS INTERCONNECTION MASS S...
 
IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
 IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
 
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at ScaleInfrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
Infrastructure Solutions for Deploying AI/ML/DL Workloads at Scale
 
Cloudera federal summit
Cloudera federal summitCloudera federal summit
Cloudera federal summit
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Building ML Pipelines with DCOS
Building ML Pipelines with DCOSBuilding ML Pipelines with DCOS
Building ML Pipelines with DCOS
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
BEST FINAL YEAR PROJECT IEEE 2015 BY SPECTRUM SOLUTIONS PONDICHERRY
BEST FINAL YEAR PROJECT IEEE 2015 BY SPECTRUM SOLUTIONS PONDICHERRYBEST FINAL YEAR PROJECT IEEE 2015 BY SPECTRUM SOLUTIONS PONDICHERRY
BEST FINAL YEAR PROJECT IEEE 2015 BY SPECTRUM SOLUTIONS PONDICHERRY
 

More from Alluxio, Inc.

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...Alluxio, Inc.
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...Alluxio, Inc.
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAlluxio, Inc.
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAlluxio, Inc.
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio, Inc.
 

More from Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Recently uploaded

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 

Recently uploaded (20)

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 

Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/ML Model Training

  • 1. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com Simplifying and Accelerating Data Access for AI/ML Model Training Kevin Petrie Vice President of Research Sridhar Venkatesh SVP of Product
  • 2. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com The Rise of Generative AI After lots of training, language models generate strings of words that become logical sentences and paragraphs • A neural network whose nodes share inputs and outputs • Learns, summarizes, and generates content • Creates textual answers to natural language questions Source: The Economist WHAT IS A LANGUAGE MODEL?
  • 3. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com The Risk: Robots Get Things Wrong Too Data teams must inspect, validate, and govern language model outputs RISKS OF LANGUAGE MODELS DATA QUALITY Inaccuracies due to inaccurate/insufficient inputs, lack of context EXPLAINABILITY Vague/unknown sources or reasoning PRIVACY Exposure or theft due to user tracking INTELLECTUAL PROPERTY Liability for mishandled trademarks, copyrights, etc. FAIRNESS Perpetuation of bias in training data
  • 4. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com Time to Get Domain Specific Domain-specific, “small” language models reduce risk and boost productivity by providing more governed and specialized outputs • Enriched, detailed user prompts • Fine-tuned training on enterprise data • Augmented outputs; e.g., from multiple models Small Language Model (SLM) Large Language Model (LLM) More Governed Generic Specialty Less Governed ENTER THE SMALL LANGUAGE MODEL
  • 5. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com Small Language Models Will Drive the GenAI Boom 30% of data practitioners are building or training their own language models now. 20% more plan to do so* *Source: Active LinkedIn survey of 55 respondents to date “We believe in a world where everyone is empowered to build and train their own models, imbued with their own opinions and viewpoints.” - Naveen Rao, Co-Founder and CEO, MosaicML
  • 6. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com “We hold these truths to be self-evident…” TEXT TOKENS We hold these … VECTORS truths to [.45, 6.3, .99] [7.6, .04, 19] [84, .13, 1.6] VECTOR DB [.45, 6.3, .99] [7.6, .04, 19] [84, .13, 1.6] LANGUAGE MODEL QUERY ONE QUERY TWO 1 2 3 4 5 Data teams must design and build new pipelines to feed their domain-specific data into language models Data Processing for Language Models Assemble unstructured text from various files Convert words and punctuation marks to tokens Use embeddings to convert tokens into numerical vectors that describe their semantics Load, organize, and index these vectors in a vector database Use a language model to search and query the vectors while responding to real-time user prompts NEW DATA PIPELINE
  • 7. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com ON PREMISES | HYBRID | CLOUD | MULTI CLOUD GENERATIVE AI EMBEDDED CATALOG GOVERN OBSERVE INTEGRATE MASTER STRUCTURED DATA (DB TABLES) SEMI STRUCTURED (LOGS, CLICKSTREAMS, SENSORS…) UNSTRUCTURED (TEXT, IMAGES…) CATALOG INTEGRATE MASTER ANALYTICS OPERATIONS As companies embed generative AI into their workflows, they must manage and process multi-structured data in a more holistic and efficient manner The New Generative AI Data Stack
  • 8. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com AI/ML Initiatives Need Fast and Simple Data Access AI/ML initiatives require companies to balance, optimize, and secure workloads across distributed datasets and compute resources • Data access. View and process data wherever it resides • Performance. Retrieve data with low latency/high throughput • Portability. Run applications wherever suitable compute resides • Cost visibility. Monitor and control compute cycles • Multi tenancy. Isolate application compute to safeguard performance • Security. Restrict data access to minimize risk of breaches REQUIREMENTS
  • 9. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com The Data Access Layer: Architecture The data access layer continuously adjusts workloads, storage, and compute • Namespace. Unified interface for all data access • APIs. Dynamic communication between applications and storage • Caching. Tier data by priority: memory, SSDs, object store • Metadata. Centralize descriptions of data objects and resources • Security. Authenticate users, authorize access, log actions
  • 10. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com Use Cases AI initiatives have a range of use cases that require fast and simple data access DATA CENTER CLOUD 1 CLOUD 2 ANALYTICS & AI IN A HYBRID ENVIRONMENT ANALYTICS & AI ACROSS CLOUDS WORKLOAD BURSTS PROJECT EXPANSIONS MIGRATIONS COST OPTIMIZATION
  • 11. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com Guiding Principles Data leaders should evaluate where and how to implement a data access layer to support generative AI initiatives FIND THE BOTTLENECK DECIDE WHETHER TO BUILD OR BUY PLAN FOR GROWTH
  • 12. © Eckerson Group 2021 Twitter: @eckersongroup www.eckerson.com Questions? I’m listening!
  • 13. 13 Retooling the enterprise data infrastructure Legacy data centers can’t keep up High Performance Computing Specialized Hardware Varied Workloads We're seeing incredible orders to retool the world's data centers… a 10-year transition to basically recycle or reclaim the world's data centers and build it out as accelerated computing. Jensen Huang Nvidia CEO “
  • 14. Challenges as you try to scale 14 GPUs are this year’s toilet paper. Wall Street Journal “ GPUs are scarce GPUs are expensive Low GPU Utilization
  • 15. Business Pressures Complex & Costly Solutions GPUs are scarce GPUs are expensive Low GPU Utilization Faster model development times Increased freshness Higher accuracy and traceability Rapidly growing datasets Extensive data engineering managing data copies Specialized storage Out of control cloud and infra costs 15
  • 16. 16 Alluxio Data Platform High Performance data access, unified global view
  • 17. 1.Faster Time-to-Market 50% Hundreds of thousands of dollars saved annually compared to previous deployment. 2-3X Model Training Performance Cost Reduction, Performance Boost International B2C with a multi-cloud, cross-region AI platform, serving LLMs and training models from object storage. They optimized their AI platform with Alluxio to speed data delivery to training clusters and facilitate faster model deployment in latency sensitive production use cases. Models Deployed in Minutes vs Days Faster model deployment times
  • 18. 2. Higher GPU Utilization “In a cloud environment, where GPU hardware is paid for as a function of time, you need fast, performant, reliable, and cost effective data for your model training pipelines to keep your GPU utilization close to 99%.” 20-30% Average reported GPU utilization based on direct access from remote storage GPU Utilization accessing commodity storage GPU Utilization accessing Alluxio Alluxio serves high throughput data to K8s training workloads. 90 % GPU utilization from Alluxio serving data pulled from object storage. In increase from 50% utilization via s3fs- fuse.
  • 19. 3. Reduction in Personnel Increase in Productivity Pre-Processed Data Data Management Pre- Processed Data Training Clusters Data scientists send requests to AI platform teams. Platform teams set up individual data pipelines. With Alluxio, data scientists just access their data. Alluxio consolidates many pipelines into an access layer. Pipeline or Scheduler Training Clusters
  • 20. 20 4. Reduction in Infrastructure Spend Alluxio optimizes data platforms to increase efficiency Data Engineering Pipelines Data workflows improved by on- demand access from Alluxio cache S3 Egress and API Fees Fees significantly reduced via granular caching and data reuse High Performance Computing Replaceable with low-cost hardware at comparable performance Reduced or Eliminated Network Congestion Network congestion reduced by serving files locally
  • 21. 5. Cloud Vendor Leverage Multi-cloud strategies with cost-effective benefits Respond to Limited GPU Availability Demand for GPUs has exploded Organizations use Alluxio to supply high performance data access to remote GPU clusters wherever they find capacity. Increase Cloud Agility Competing CSPs may provide attractive discounts Alluxio empowers organizations to capitalize on hardware discounts or cost-effective storage in real-time. Users access data wherever it resides. Avoid Vendor Lock-In Negotiate with CSPs from a stronger position Single cloud deployments are convenient, but that may become an obstacle in negotiations. Alluxio facilitates hybrid and multi-cloud.