SlideShare a Scribd company logo
1 of 28
Download to read offline
BIG DATA ANALYTICS
BUSINESS INTELLIGENCE
INFORMATION MANAGEMENT
PERFORMANCE MANAGEMENT
© Copyright 2015 – Keyrus 2
DIVING INTO WEBLOG DATA WITH SAS ON
HADOOP
Lisa Truyers, Data Scientist Consultant at Keyrus
March 24, 2016
Logo
© Copyright 2015 – Keyrus 3
Project summary
WHO HAS EVER TRIED TO OPEN A 1 GB FILE ON A COMPUTER?
© Copyright 2015 – Keyrus 4
What is Hadoop?
Project summary
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 5
PROS
 Open-source software framework
 Storage and large-scale data processing
 Easy and economic scaling
 Both structured and unstructured data
 Low-cost commodity hardware
 Starts multiple copies of the same task for
the same block of data
What is Hadoop?
51% OF COMPANIES THINKS ABOUT INTEGRATING
HADOOP IN THEIR COMPANY BY 2016
Philip Russom, TDWI Best Practices Report= Integrating Hadoop into Business
© Copyright 2015 – Keyrus 6
CONS
 Management and high-availability
capabilities are just starting to emerge
 Data security is fragmented
 MapReduce is very batch-oriented
 No easy-to-use, full-feature tools for data
integration, data cleansing, governance
and metadata
 Lacking skilled professionals
What is Hadoop?
MANAGE THE DATA AND USE ANALYTICS TO QUICKLY
IDENTIFY PREVIOUSLY UNKNOWN INSIGHTS: ACCESS
THE DIFFERENT TOOLS OF SAS
© Copyright 2015 – Keyrus 7
WHAT ARE COMPANIES DOING WITH HADOOP?
The percentages mentioned here cover the whole world, not only Europe.
What is Hadoop?
What? Percentage
Data warehouse extensions 46 %
Data exploration and discovery 46 %
Data staging for data warehousing and data integration 39 %
Data lake 39 %
Queryable archive for non-traditional data 36 %
Computational platform and sandbox for advanced analytics 33 %
© Copyright 2015 – Keyrus 8
WHY IS HADOOP (NOT) IMPORTANT?
“Cost savings. Linear scalability. Evaluate ‘the hype’ practically. Complement BI.”
BI architect, telecom, Europe
“Reduces cost of data. New ability to query big data sets. Supply chain improvements. Predictive
analytics.”
Vice president, food and beverage, Asia
“Our existing infrastructure cannot handle the tenfold increase in data volumes.”
Data strategy manager, hospitality, US
“It’s important to realize the potential of big data and to explore new business opportunities.”
Data specialist, consulting, Asia
What is Hadoop?
© Copyright 2015 – Keyrus 9
What is Hadoop?
Project summary
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 10
INTRODUCTION
Project summary
1. Discover web traffic data
• Discover web traffic data
• Sheer volume of data makes it impossible to analyse at the moment
• Prove the added value of a combined Hadoop – SAS environment
2. Lead generation
• More business oriented: scoring a neural network model takes one hour on daily basis
• Reducing this time
© Copyright 2015 – Keyrus 11
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 12
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 13
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 14
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 15
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 16
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 17
HADOOP COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 18
SAS COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 19
SAS COMPONENTS
Components of the Hadoop-SAS framework
HBASE PIG HIVE &
HCATALOG
MAP REDUCE
HDFS
AMBARI
OOZIE
FLUME
SQOOP
NFS
WebHDFS
YARN
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS® LASR™ Analytic
Server
SAS® High-
Performance
Analytic Procedures
Base SAS & SAS/ACCESS® to Hadoop™
SAS Metadata
SAS IMSTAT for
Hadoop
SAS® Visual Analytics &
Statistics
SAS® Enterprise
Guide®
© Copyright 2015 – Keyrus 20
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 21
FULL PROCESS
Setup to load data
Day
A Partitioned, non-parsed for day-files
C Partitioned, parsed for day-files
Hour
B Partitioned, non-parsed for hour-files
D Partitioned, parsed for hour-files
© Copyright 2015 – Keyrus 22
Setup to load data
© Copyright 2015 – Keyrus 23
PROCESS C
Setup to load data
Delete HIVE
Table
Transfer to
Hadoop
Parse data Merge Loop
© Copyright 2015 – Keyrus 24
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
SAS-tools used in this project
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 25
HADOOP COMPARED TO SERVER
Server
 Query test one day: 35 seconds
 Parsing data on one day: 15 minutes
 Parsing of one week: 4hours 30 minutes
Benchmarks
Hadoop
 Query test on one day: 35 seconds
 Parsing data on one day: 15 minutes
 Parsing of one week: 53 minutes
MORE TIME NEEDED FOR EXTRA BENCHMARKS
© Copyright 2015 – Keyrus 26
Project summary
What is Hadoop?
Components of the Hadoop-SAS framework
SAS-tools used in this project
Setup to load data
Benchmarks
Lessons learned
AGENDA
© Copyright 2015 – Keyrus 27
Teamwork is key
• Set-up Hadoop cluster with
Hadoop-experts
• Install SAS with experts from
the company
SAS ON HADOOP
 In SAS, take your time to set the correct
variable length
 Choose the strength of the cluster
rationally
 Create Benchmarks on both environments
(server VS Hadoop) early on so a good
comparison can be done and the correct
decision can be taken
 Data must be large enough on Hadoop to
see a difference
Lessons learned
THANK YOU FOR YOUR ATTENTION
To contact us
www.keyrus.com
contact@keyrus.com

More Related Content

What's hot

Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Pentaho
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Pentaho
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015 Pentaho
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer SuccessPentaho
 
Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journeyDataWorks Summit
 
BI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - CrosspointBI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - CrosspointBICC Thomas More
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyCloudera, Inc.
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemCapgemini
 
Data – The New Raw Material for Business
Data – The New Raw Material for BusinessData – The New Raw Material for Business
Data – The New Raw Material for BusinessCapgemini
 
5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik RoudaSpark Summit
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseRittman Analytics
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?Rackspace
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?SAS Canada
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketApigee | Google Cloud
 

What's hot (20)

Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing
 
Big Data Predictions for 2015
Big Data Predictions for 2015 Big Data Predictions for 2015
Big Data Predictions for 2015
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer Success
 
Hilton's enterprise data journey
Hilton's enterprise data journeyHilton's enterprise data journey
Hilton's enterprise data journey
 
BI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - CrosspointBI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
BI congres 2014-4: thinking out of the box - Jos Cools - Crosspoint
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
 
Data – The New Raw Material for Business
Data – The New Raw Material for BusinessData – The New Raw Material for Business
Data – The New Raw Material for Business
 
5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda5 Myths about Spark and Big Data by Nik Rouda
5 Myths about Spark and Big Data by Nik Rouda
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
MongoDB IoT City Tour EINDHOVEN: Analysing the Internet of Things: Davy Nys, ...
 
How Startups can leverage big data?
How Startups can leverage big data?How Startups can leverage big data?
How Startups can leverage big data?
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the Market
 

Viewers also liked

BI congres 2016: programma
BI congres 2016: programmaBI congres 2016: programma
BI congres 2016: programmaBICC Thomas More
 
Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!BICC Thomas More
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist? BICC Thomas More
 
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas MoreBI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas MoreBICC Thomas More
 
Performance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelfPerformance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelfBICC Thomas More
 
BI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - TeradataBI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - TeradataBICC Thomas More
 

Viewers also liked (6)

BI congres 2016: programma
BI congres 2016: programmaBI congres 2016: programma
BI congres 2016: programma
 
Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!Digitale Disruptie - BI slaat terug!
Digitale Disruptie - BI slaat terug!
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas MoreBI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
BI congres 2014-6: Opleiding Informatiemanagement - Hans Tubbax - Thomas More
 
Performance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelfPerformance MANAGEMENT 3.0 - De evidentie zelf
Performance MANAGEMENT 3.0 - De evidentie zelf
 
BI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - TeradataBI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
BI congres 2014-3: facts not opinions - Tobias Temmink - Teradata
 

Similar to BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus

Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...CA Technologies
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureSkillspeed
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysDataWorks Summit
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...DataWorks Summit
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsMethod360
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
HP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government CloudHP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government CloudIDG Vietnam Public Sector
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLMatt Lord
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchVMware Tanzu
 
SAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSUSE Italy
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHortonworks
 

Similar to BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus (20)

Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
Sneak Peak into Self-Service, Cross-Enterprise, Job Scheduling with CA Worklo...
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Making the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British AirwaysMaking the Case for Hadoop in a Large Enterprise-British Airways
Making the Case for Hadoop in a Large Enterprise-British Airways
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...Lowering the entry point to getting going with Hadoop and obtaining business ...
Lowering the entry point to getting going with Hadoop and obtaining business ...
 
Why Hadoop as a Service?
Why Hadoop as a Service?Why Hadoop as a Service?
Why Hadoop as a Service?
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
HP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government CloudHP Cloud System Matrix – The Foundation for Government Cloud
HP Cloud System Matrix – The Foundation for Government Cloud
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Unlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQLUnlocking Big Data Insights with MySQL
Unlocking Big Data Insights with MySQL
 
Pivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ LaunchPivotal Strata NYC 2015 Apache HAWQ Launch
Pivotal Strata NYC 2015 Apache HAWQ Launch
 
SAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service PlatformSAP Data Hub e SUSE Container as a Service Platform
SAP Data Hub e SUSE Container as a Service Platform
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Hadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - JaspersoftHadoop Reporting and Analysis - Jaspersoft
Hadoop Reporting and Analysis - Jaspersoft
 

Recently uploaded

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 

Recently uploaded (20)

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

BI congres 2016-2: Diving into weblog data with SAS on Hadoop - Lisa Truyers - Keyrus

  • 1. BIG DATA ANALYTICS BUSINESS INTELLIGENCE INFORMATION MANAGEMENT PERFORMANCE MANAGEMENT
  • 2. © Copyright 2015 – Keyrus 2 DIVING INTO WEBLOG DATA WITH SAS ON HADOOP Lisa Truyers, Data Scientist Consultant at Keyrus March 24, 2016 Logo
  • 3. © Copyright 2015 – Keyrus 3 Project summary WHO HAS EVER TRIED TO OPEN A 1 GB FILE ON A COMPUTER?
  • 4. © Copyright 2015 – Keyrus 4 What is Hadoop? Project summary Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 5. © Copyright 2015 – Keyrus 5 PROS  Open-source software framework  Storage and large-scale data processing  Easy and economic scaling  Both structured and unstructured data  Low-cost commodity hardware  Starts multiple copies of the same task for the same block of data What is Hadoop? 51% OF COMPANIES THINKS ABOUT INTEGRATING HADOOP IN THEIR COMPANY BY 2016 Philip Russom, TDWI Best Practices Report= Integrating Hadoop into Business
  • 6. © Copyright 2015 – Keyrus 6 CONS  Management and high-availability capabilities are just starting to emerge  Data security is fragmented  MapReduce is very batch-oriented  No easy-to-use, full-feature tools for data integration, data cleansing, governance and metadata  Lacking skilled professionals What is Hadoop? MANAGE THE DATA AND USE ANALYTICS TO QUICKLY IDENTIFY PREVIOUSLY UNKNOWN INSIGHTS: ACCESS THE DIFFERENT TOOLS OF SAS
  • 7. © Copyright 2015 – Keyrus 7 WHAT ARE COMPANIES DOING WITH HADOOP? The percentages mentioned here cover the whole world, not only Europe. What is Hadoop? What? Percentage Data warehouse extensions 46 % Data exploration and discovery 46 % Data staging for data warehousing and data integration 39 % Data lake 39 % Queryable archive for non-traditional data 36 % Computational platform and sandbox for advanced analytics 33 %
  • 8. © Copyright 2015 – Keyrus 8 WHY IS HADOOP (NOT) IMPORTANT? “Cost savings. Linear scalability. Evaluate ‘the hype’ practically. Complement BI.” BI architect, telecom, Europe “Reduces cost of data. New ability to query big data sets. Supply chain improvements. Predictive analytics.” Vice president, food and beverage, Asia “Our existing infrastructure cannot handle the tenfold increase in data volumes.” Data strategy manager, hospitality, US “It’s important to realize the potential of big data and to explore new business opportunities.” Data specialist, consulting, Asia What is Hadoop?
  • 9. © Copyright 2015 – Keyrus 9 What is Hadoop? Project summary Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 10. © Copyright 2015 – Keyrus 10 INTRODUCTION Project summary 1. Discover web traffic data • Discover web traffic data • Sheer volume of data makes it impossible to analyse at the moment • Prove the added value of a combined Hadoop – SAS environment 2. Lead generation • More business oriented: scoring a neural network model takes one hour on daily basis • Reducing this time
  • 11. © Copyright 2015 – Keyrus 11 Project summary What is Hadoop? Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 12. © Copyright 2015 – Keyrus 12 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 13. © Copyright 2015 – Keyrus 13 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 14. © Copyright 2015 – Keyrus 14 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 15. © Copyright 2015 – Keyrus 15 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 16. © Copyright 2015 – Keyrus 16 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 17. © Copyright 2015 – Keyrus 17 HADOOP COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures SAS® Enterprise Guide®
  • 18. © Copyright 2015 – Keyrus 18 SAS COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® Enterprise Guide®
  • 19. © Copyright 2015 – Keyrus 19 SAS COMPONENTS Components of the Hadoop-SAS framework HBASE PIG HIVE & HCATALOG MAP REDUCE HDFS AMBARI OOZIE FLUME SQOOP NFS WebHDFS YARN Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS® LASR™ Analytic Server SAS® High- Performance Analytic Procedures Base SAS & SAS/ACCESS® to Hadoop™ SAS Metadata SAS IMSTAT for Hadoop SAS® Visual Analytics & Statistics SAS® Enterprise Guide®
  • 20. © Copyright 2015 – Keyrus 20 Project summary What is Hadoop? Components of the Hadoop-SAS framework Setup to load data Benchmarks Lessons learned AGENDA
  • 21. © Copyright 2015 – Keyrus 21 FULL PROCESS Setup to load data Day A Partitioned, non-parsed for day-files C Partitioned, parsed for day-files Hour B Partitioned, non-parsed for hour-files D Partitioned, parsed for hour-files
  • 22. © Copyright 2015 – Keyrus 22 Setup to load data
  • 23. © Copyright 2015 – Keyrus 23 PROCESS C Setup to load data Delete HIVE Table Transfer to Hadoop Parse data Merge Loop
  • 24. © Copyright 2015 – Keyrus 24 Project summary What is Hadoop? Components of the Hadoop-SAS framework SAS-tools used in this project Setup to load data Benchmarks Lessons learned AGENDA
  • 25. © Copyright 2015 – Keyrus 25 HADOOP COMPARED TO SERVER Server  Query test one day: 35 seconds  Parsing data on one day: 15 minutes  Parsing of one week: 4hours 30 minutes Benchmarks Hadoop  Query test on one day: 35 seconds  Parsing data on one day: 15 minutes  Parsing of one week: 53 minutes MORE TIME NEEDED FOR EXTRA BENCHMARKS
  • 26. © Copyright 2015 – Keyrus 26 Project summary What is Hadoop? Components of the Hadoop-SAS framework SAS-tools used in this project Setup to load data Benchmarks Lessons learned AGENDA
  • 27. © Copyright 2015 – Keyrus 27 Teamwork is key • Set-up Hadoop cluster with Hadoop-experts • Install SAS with experts from the company SAS ON HADOOP  In SAS, take your time to set the correct variable length  Choose the strength of the cluster rationally  Create Benchmarks on both environments (server VS Hadoop) early on so a good comparison can be done and the correct decision can be taken  Data must be large enough on Hadoop to see a difference Lessons learned
  • 28. THANK YOU FOR YOUR ATTENTION To contact us www.keyrus.com contact@keyrus.com