Submit Search
Upload
Hadoop & Cloud Storage: Object Store Integration in Production
•
Download as PPTX, PDF
•
7 likes
•
3,220 views
DataWorks Summit/Hadoop Summit
Follow
Hadoop & Cloud Storage: Object Store Integration in Production
Read less
Read more
Technology
Report
Share
Report
Share
1 of 26
Download now
Recommended
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
A Multi Colored YARN
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
DataWorks Summit/Hadoop Summit
Fine-Grained Security for Spark and Hive
Fine-Grained Security for Spark and Hive
DataWorks Summit/Hadoop Summit
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
Recommended
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
Scalable OCR with NiFi and Tesseract
Scalable OCR with NiFi and Tesseract
DataWorks Summit/Hadoop Summit
A Multi Colored YARN
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
Row/Column- Level Security in SQL for Apache Spark
Row/Column- Level Security in SQL for Apache Spark
DataWorks Summit/Hadoop Summit
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
DataWorks Summit/Hadoop Summit
Fine-Grained Security for Spark and Hive
Fine-Grained Security for Spark and Hive
DataWorks Summit/Hadoop Summit
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
Log Analytics Optimization
Log Analytics Optimization
Hortonworks
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
DataWorks Summit/Hadoop Summit
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Scheduling Policies in YARN
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
Spark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
DataWorks Summit
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
Cécile Poyet
Hadoop crashcourse v3
Hadoop crashcourse v3
Hortonworks
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
Apache deep learning 101
Apache deep learning 101
DataWorks Summit
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
DataWorks Summit
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
DataWorks Summit/Hadoop Summit
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFS
DataWorks Summit
More Related Content
What's hot
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
Log Analytics Optimization
Log Analytics Optimization
Hortonworks
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
DataWorks Summit/Hadoop Summit
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
Scheduling Policies in YARN
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
Spark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
DataWorks Summit
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
Cécile Poyet
Hadoop crashcourse v3
Hadoop crashcourse v3
Hortonworks
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
Apache deep learning 101
Apache deep learning 101
DataWorks Summit
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
DataWorks Summit
What's hot
(20)
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Log Analytics Optimization
Log Analytics Optimization
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Scheduling Policies in YARN
Scheduling Policies in YARN
Spark Uber Development Kit
Spark Uber Development Kit
Internet of things Crash Course Workshop
Internet of things Crash Course Workshop
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
Hadoop crashcourse v3
Hadoop crashcourse v3
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
Apache deep learning 101
Apache deep learning 101
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
Viewers also liked
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
DataWorks Summit/Hadoop Summit
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFS
DataWorks Summit
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Hortonworks
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
certificate 100 best graduates
certificate 100 best graduates
Toma Gaidyte
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS Storage
Pete Kisich
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Cloudera, Inc.
Mahout classification presentation
Mahout classification presentation
Naoki Nakatani
Farming hadoop in_the_cloud
Farming hadoop in_the_cloud
Steve Loughran
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
HDFS Tiered Storage
HDFS Tiered Storage
DataWorks Summit/Hadoop Summit
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
DataWorks Summit
JVM and OS Tuning for accelerating Spark application
JVM and OS Tuning for accelerating Spark application
Tatsuhiro Chiba
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop Clusters
DataWorks Summit/Hadoop Summit
Current clustering techniques
Current clustering techniques
Poonam Kshirsagar
Introduction to CoAP the REST protocol for M2M
Introduction to CoAP the REST protocol for M2M
Julien Vermillard
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
DataWorks Summit/Hadoop Summit
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
Hortonworks
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Cynefin sensemaking framework and usage examples
Cynefin sensemaking framework and usage examples
LuxoftAgilePractice
Viewers also liked
(20)
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFS
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
certificate 100 best graduates
certificate 100 best graduates
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS Storage
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Mahout classification presentation
Mahout classification presentation
Farming hadoop in_the_cloud
Farming hadoop in_the_cloud
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
HDFS Tiered Storage
HDFS Tiered Storage
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
Reduce Storage Costs by 5x Using The New HDFS Tiered Storage Feature
JVM and OS Tuning for accelerating Spark application
JVM and OS Tuning for accelerating Spark application
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop Clusters
Current clustering techniques
Current clustering techniques
Introduction to CoAP the REST protocol for M2M
Introduction to CoAP the REST protocol for M2M
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Hadoop in the Cloud: Real World Lessons from Enterprise Customers
Best Practices for Virtualizing Apache Hadoop
Best Practices for Virtualizing Apache Hadoop
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
Cynefin sensemaking framework and usage examples
Cynefin sensemaking framework and usage examples
Similar to Hadoop & Cloud Storage: Object Store Integration in Production
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
DataWorks Summit
Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Big Data Spain
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Mingliang Liu
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
Micro services vs hadoop
Micro services vs hadoop
Gergely Devenyi
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
DataWorks Summit
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Seetharam Venkatesh
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
DataWorks Summit
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
alanfgates
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
DataWorks Summit/Hadoop Summit
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Alberto Romero
Running Services on YARN
Running Services on YARN
DataWorks Summit/Hadoop Summit
Similar to Hadoop & Cloud Storage: Object Store Integration in Production
(20)
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
Big data spain keynote nov 2016
Big data spain keynote nov 2016
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
Micro services vs hadoop
Micro services vs hadoop
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
Sharing metadata across the data lake and streams
Sharing metadata across the data lake and streams
Standalone metastore-dws-sjc-june-2018
Standalone metastore-dws-sjc-june-2018
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Hive acid and_2.x new_features
Hive acid and_2.x new_features
Running Services on YARN
Running Services on YARN
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Recently uploaded
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
blackmambaettijean
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
BookNet Canada
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Lorenzo Miniero
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
UiPathCommunity
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
Alan Dix
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Lonnie McRorey
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
LoriGlavin3
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
DianaGray10
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
LoriGlavin3
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
LoriGlavin3
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Nathaniel Shimoni
Recently uploaded
(20)
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Hadoop & Cloud Storage: Object Store Integration in Production
1.
1 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Hadoop & Cloud Storage: Object Store Integration in Production Chris Nauroth Rajesh Balamohan Hadoop Summit 2016
2.
2 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved About Us Rajesh Balamohan, rbalamohan@hortonworks.com, Twitter: @rajeshbalamohan – Apache Tez Committer, PMC Member – Mainly working on performance in Tez – Have been using Hadoop since 2009 Chris Nauroth, cnauroth@hortonworks.com, Twitter: @cnauroth – Apache Hadoop committer, PMC member, and Apache Software Foundation member – Working on HDFS and alternative file systems such as WASB and S3A – Hadoop user since 2010 Steve Loughran, stevel@hortonworks.com, Twitter: @steveloughran – Apache Hadoop committer, PMC member, and Apache Software Foundation member – Hadoop deployment since 2008, especially Cloud integration, Filesystem Spec author. – Working on: Apache Slider, Spark+cloud integration, Hadoop + Cloud
3.
3 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Agenda ⬢ Hadoop/Cloud Storage Integration Use Cases ⬢ Hadoop-compatible File System Architecture ⬢ Recent Enhancements in S3A FileSystem Connector ⬢ Hive Access Patterns ⬢ Performance Improvements and TPC-DS Benchmarks with Hive-TestBench ⬢ Next Steps for S3A and other Object Stores ⬢ Q & A
4.
4 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Why Hadoop in the Cloud?
5.
5 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Hadoop Cloud Storage Utilization Evolution HDFS Application HDFS Application GoalEvolution towards cloud storage as the primary Data Lake Input Output Backup Restore Input Output Copy HDFS Application Input Output tmp
6.
6 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved What is the Problem? Cloud Object Stores designed for ⬢ Scale ⬢ Cost ⬢ Geographic Distribution ⬢ Availability ⬢ Cloud app writers often modify apps to deal with cloud storage semantics and limitations Challenges - Hadoop apps should work on HDFS or Cloud Storage transparently ⬢ Eventual consistency ⬢ Performance - separated from compute ⬢ Cloud Storage not designed for file-like access patterns ⬢ Limitations in APIs (e.g. rename)
7.
7 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Goal and Approach Goals ⬢ Integrate with unique functionality of each cloud ⬢ Optimize each cloud’s object store connector ⬢ Optimize upper layers for cloud object stores Overall Approach ⬢ Consistency in face of eventual consistency (use a secondary metadata store) ⬢ Performance in the connector (e.g. lazy seek) ⬢ Upper layer improvements (Hive, ORC, Tez, etc.)
8.
8 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Hadoop-compatible File System Architecture
9.
9 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Hadoop-compatible File System Architecture ⬢ Applications – File system interactions coded to file system-agnostic abstraction layer. • FileSystem class - traditional API • FileContext/AbstractFileSystem classes - newer API providing split between client API and provider API – Can be retargeted to a different file system by configuration changes (not code changes). • Caveat: Different FileSystem implementations may offer limited feature set. • Example: Only HDFS and WASB can run HBase. ⬢ File System Abstraction Layer – Defines interface of common file system operations: create, open, rename, etc. – Supports additional mix-in interfaces to indicate implementation of optional features. – Semantics of each operation documented in formal specification, derived from HDFS behavior. ⬢ File System Implementation Layer – Each file system provides a set of concrete classes implementing the interface. – A set of common file system contract tests execute against each implementation to prove its adherence to specified semantics.
10.
1 0 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Cloud Storage Connectors Azure WASB ● Strongly consistent ● Good performance ● Well-tested on applications (incl. HBase) ADL ● Strongly consistent ● Tuned for big data analytics workloads Amazon Web Services S3A ● Eventually consistent - consistency work in progress by Hortonworks ● Performance improvements in progress ● Active development in Apache EMRFS ● Proprietary connector used in EMR ● Optional strong consistency for a cost Google Cloud Platform GCS ● Multiple configurable consistency policies ● Currently Google open source ● Good performance ● Work under way for contribution to Apache
11.
1 1 © Hortonworks Inc.
2011 – 2016. All Rights Reserved 1 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Case Study: S3A Functionality and Performance
12.
1 2 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Authentication ⬢ Basic – AWS Access Key ID and Secret Access Key in Hadoop Configuration Files – Hadoop Credential Provider API to avoid using world-readable configuration files ⬢ EC2 Metadata – Reads credentials published by AWS directly into EC2 VM instances – More secure, because external distribution of secrets not required ⬢ AWS Environment Variables – Less secure, but potentially easier integration for some applications ⬢ Session Credentials – Temporary security credentials issued by Amazon Security Token Service – Fixed lifetime reduces impact of credential leak ⬢ Anonymous Login – Easy read-only access to public buckets for early prototyping
13.
1 3 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Encryption ⬢ S3 Server-Side Encryption – Encryption of data at rest at S3 – Supports the SSE-S3 option: each object encrypted by a unique key using AES-256 cipher – Now covered in S3A automated test suites – Support for additional options under development (SSE-KMS and SSE-C)
14.
1 4 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Supportability ⬢ Documentation – Backfill missing documentation, and include documentation in new enhancements – To be published to hadoop.apache.org with Apache Hadoop 2.8.0 release – Meanwhile, raw content visible on GitHub: • https://github.com/apache/hadoop/blob/branch-2.8/hadoop-tools/hadoop- aws/src/site/markdown/tools/hadoop-aws/index.md ⬢ Error Reporting – Identify common user errors and provide more descriptive error messages – S3 HTTP error codes examined and translated to specific error types ⬢ Instrumentation – Internal metrics covering a wide range of metadata and data operations – Already proven helpful in flagging a potential performance regression in a patch
15.
1 5 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Performance Improvements ⬢ Lazy Seek – Earlier implementation • Reopened file in every seek call; Aborted connection in every reopen • Positional Read was expensive (seek, read, seek) – Current implementation • Seek is a no-op call • Performs real seek on need basis ⬢ Connection Abort Problem – Backward seeks caused connection aborts – Recent modifications to S3AFileSystem fixes these and added support for sequential reads and random reads • fs.s3a.experimental.input.fadvise
16.
1 6 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Hive Access Patterns ⬢ ETL and Admin Activities – Bringing in dataset / Creating Tables – Cleansing / Transforming Data – Analyze Tables, Compute Column Statistics – MSCK to fix partition related information ⬢ Read – Running Queries ⬢ Write – Store Output
17.
1 7 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Hive - MSCK Improvements ⬢ MSCK helps in fixing metastore for partitioned dataset – Scan table path to identify missing partitions (expensive in S3)
18.
1 8 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Hive - Analyze Column Statistics Improvements ⬢ Hive needs statistics to run queries efficiently – Gathering table and column statistics can be expensive in partitioned datasets
19.
1 9 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Performance Considerations When Running Hive Queries ⬢ Splits Generation – File formats like ORC provides threadpool in split generation ⬢ ORC Footer Cache – hive.orc.cache.stripe.details.size > 0 – Caches footer details; Helps in reducing data reads during split generation ⬢ Reduce S3A reads in Task side – hive.orc.splits.include.file.footer=true – Sends ORC footer information in splits payload. – Helps reducing the amount of data read in task side.
20.
2 0 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Performance Considerations When Running Hive Queries ⬢ Tez Splits Grouping – Hive uses Tez as its default execution engine – Tez groups splits based on min/max group setting, location details and so on – S3A always provides “localhost” as its block location information – When all splits-length falls below min group setting, Tez aggressively groups them into single split. This causes issues with S3A as single task ends up doing sequential operations. – Fixed in recent releases ⬢ Container Launches – S3A always provides “localhost” for block locations. – Good to set “yarn.scheduler.capacity.node-locality-delay=0”
21.
2 1 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Hive-TestBench Benchmark Results ⬢ Hive-TestBench has subset of queries from TPC-DS (https://github.com/hortonworks/hive-testbench) ⬢ m4x4x large - 5 nodes ⬢ TPC-DS @ 200 GB Scale in S3 ⬢ “HDP 2.3 + S3 in cloud” vs “HDP 2.4 + S3 in cloud” – Average speedup 2.5x – Queries like 15,17, 25, 73,75 etc did not run in HDP 2.3 (throws AWS timeout exceptions)
22.
2 2 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Hive-TestBench Benchmark Results - LLAP ⬢ LLAP DAG runtime comparison with Hive ⬢ Reduces the amount of data to be read from S3 significantly; Improves runtime.
23.
2 3 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Best Practices ⬢ Tune multipart settings – fs.s3a.multipart.threshold (default: Integer.MAX_VALUE) – fs.s3a.multipart.size (default: 100 MB) – fs.s3a.connection.timeout (default: 200 seconds) ⬢ Disable node locality delay in YARN – Set “yarn.scheduler.capacity.node-locality-delay=0” to avoid delays in container launches ⬢ Disable Storage Based authorization in Hive – hive.security.metastore.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.DefaultHiveMetas toreAuthorizationProvider – hive.metastore.pre.event.listeners= (set to empty value) ⬢ Tune ORC threads for reducing split generation times – hive.orc.compute.splits.num.threads (default 10)
24.
2 4 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Next Steps for S3A and other Object Stores ⬢ S3A Phase III – https://issues.apache.org/jira/browse/HADOOP-13204 ⬢ Output Committers – Logical commit operation decoupled from rename (non-atomic and costly in object stores) ⬢ Object Store Abstraction Layer – Avoid impedance mismatch with FileSystem API – Provide specific APIs for better integration with object stores: saving, listing, copying ⬢ Ongoing Performance Improvement – Less chatty call pattern for object listings – Metadata caching to mask latency of remote object store calls ⬢ Consistency
25.
2 5 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Summary ⬢ Evolution towards cloud storage ⬢ Hadoop-compatible File System Architecture fosters integration with cloud storage ⬢ Integration with multiple cloud providers available: Azure, AWS, Google ⬢ Recent enhancements in S3A ⬢ Hive usage and TPC-DS benchmarks show significant S3A performance improvements ⬢ More coming soon for S3A and other object stores
26.
2 6 © Hortonworks Inc.
2011 – 2016. All Rights Reserved Q & A Thank You!
Download now