Data Infra Meetup
Jan. 25, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jing Zhao (Principal Engineer, @Uber)
Uber builds one of the biggest data lakes in the industry, which stores exabytes of data. In this talk, we will introduce the evolution of our data storage architecture, and delve into multiple key initiatives during the past several years.
Specifically, we will introduce:
- Our on-prem HDFS cluster scalability challenges and how we solved them
- Our efficiency optimizations that significantly reduced the storage overhead and unit cost without compromising reliability and performance
- The challenges we are facing during the ongoing Cloud migration and our solutions
6. HDFS Router-based Federation
● R/W routers + Read-only Routers
● Rolled out to Uber’s production
since 2019
● Greatly improved HDFS
scalability
● Distributing traffic to 30 HDFS
clusters
7. Containerization and Automation
● Containerized across data
plane and control plane
○ Including NameNode
with 300+ GB heap size
● Fully automated for cluster
management
○ Managing 11K nodes
○ NN + JN
9. HDFS Erasure Coding
HDFS Hot
Clusters
HDFS EC Clusters
(Hadoop 3)
HDFS
Router
Clients
(Hadoop 2.x)
EC Access Proxy
Data Transfer
Data
Correctness
Scanner
Replicated Data
Detector
Offline EC
Converter
RPC
● 50% storage saving with
Reed–Solomon(6, 3)
● EC access proxy
○ Seamless access for
Hadoop 2.x clients
○ Avoid Hadoop version
upgrade
10. ● Capacity per Host: 4TB * 24 → 16TB * 35
● Efficiency: >50% HW cost reduction
● Challenges
○ DataNode IO performance
○ HDFS reliability (blast radius)
● Opportunities
○ Traffic focuses on a small group of
extremely hot blocks
○ Top 10K blocks attracted >90% read
traffic
Adopting High-Density HDD in HDFS
11. ● Build a local cache within DataNode
○ 4TB NVMe SSD disk
○ Based on DataNode local traffic
● Leverage Alluxio for cache management
○ Page-level cache
○ 1MB default page size matches traffic
pattern
DataNode Local Cache
12. Cloud Migration
2023 ~ Present
● Replacing HDFS with Cloud Object Storages
● Hybrid Cloud and Multi-Cloud Architectures
13. ● Migrating Batch Data
Processing Stack to Google®
Cloud Platform (GCP)
● Replace HDFS with Google®
Cloud Storage (GCS)
● Logical namespace to abstract
out internal bucket layout
● Performance optimizations
Cloud Object Storage
14. Perf/Func Optimizations
IO capacity limits Traffic balancing and bucket pre-splits
Write throughput GVNIC adoption for aggregated throughput improvement: 20 Gbps → 32 Gbps
Parallel composite uploads for single writer throughput improvement
Read/Listing latenties gRPC APIs for better performance consistency
Presto: local SSD cache
Hive/Spark parallel listing for partitioning data
Hudi: the performance improvements with 0.14 features
Rename Failure handling and Python library enhancement
Spark optimized file output committer
Performance optimizations
15. Hybrid Cloud Architecture (WIP)
● One logical DataLake on unified data
storage
○ Across on-prem HDFS and Cloud
object storage
○ Logical paths to abstract out internal
details
● Optimizations for
○ Ingress/Egress traffic cost
○ Data storage cost
16. Tables and Blobs: Unified Multi-Cloud Storage (Future)
● Tables and Blobs
● Multi-Cloud architecture
○ Google Cloud Platform (GCP)
○ Oracle® Cloud Infrastructure
(OCI)
● Data orchestration and caching
17. "Apache®, Apache Hadoop®, Hadoop®, and Apache Spark® are either registered trademarks or trademarks of the Apache
Software Foundation® in the United States and/or other countries. No endorsement by The Apache Software Foundation® is
implied by the use of these marks."
"Google®, Google Cloud Platform®, and Google Cloud Storage® are either registered trademarks or trademarks of Google LLC in
the United States and/or other countries. No endorsement by Google LLC is implied by the use of these marks."
"Oracle® is a registered trademarks of Oracle Corporation. No endorsement by Oracle Corporation is implied by the use of the
mark."
"Presto® is a registered trademark of LF Projects, LLC. No endorsement by LF Projects, LLC is implied by the use of the mark."