SlideShare a Scribd company logo
1 of 18
Accelerating Data Warehouse
Modernization
Ajay Anand, VP Products, Kyvos Insights
Vineet Tyagi, CTO, Impetus
Our 40 Minutes
Today
• Drivers for Data Warehouse Modernization
• What is a Modern Data Warehouse
• Challenges for implementing a Modern Data
Warehouse
• Driving adoption and usage within the enterprise
• Measuring success factors and ROI
Data Warehouse Modernization – Drivers
Optimize Existing DW/BI Infrastructure of Create New
Capabilities
Handle Big Data and the 3 V’s
• Volume, Variety, Velocity
Integrate Multiple Data Silos
• ERP, CRM, HRM and others
Reduce Cost
• ETL process
• Analytical process
• Mainframe process
• Cloud feasibility for data analytics
Applying Science
• Unstructured data for enhancing analytics
• Data Science for advanced analytics
Reduce Time to Market by Faster Processing Analytics
Blueprint of a Modern Data Warehouse with Hadoop
The enterprise data warehouse (EDW) and Hadoop based warehouse would co-exist to allow the
enterprise to leverage the strengths of each architecture.
Landing and
ingestionStructured
Unstructured
External Social
Machine
Geospatial
Time Series
Streaming
Provisioning,Workflow, Monitoring and Security
Enterprise
Data Lake
Real-Time applications
Predictive
applications
Exploration &
discovery
Enterprise
applications
Traditional data
repositories
RDBMS MPP
Key Challenges for Modernization
“Through 2018, 90% of modernized warehouses will be useless
as they are overwhelmed with information assets captured for
uncertain use cases”
Key Challenges for Modernization
“Visual data-discovery, an important enabler of end user self-service,
will grow 2.5 x faster than the rest of the market, becoming by 2018 a
requirement for all enterprises.”
Making insights and data in the warehouse readily discoverable, accessible and
usable
Key Challenges for Modernization
Is the opposite of “Dumb” data
• hard to find
• hard to understand
• hard to combine
Data in the Lake has to be Smart
Rethink the information plumbing
• Supplement first , transform later
• Maximize ROI by protecting investments
Rethink ETL – Light weight data blending tools that
can allow for data wrangling when business cannot
wait
Key Challenges for Modernization
“By 2017, most business users and analysts in organizations will
have access to self-service tools to prepare data for analysis”
“Managed BI Self-Service Will Continue to Close
the Business and Technology Gap.”
Self Service BI over Hadoop
Using big data capabilities as
a “landing zone” before
determining what data should
be moved to the data
warehouse
PRE-PROCESSING
Moving infrequently
accessed data from data
warehouses into enterprise-
grade Hadoop
Moving associated workloads
to be serviced from Hadoop
OFFLOADING
Using big data capabilities to
explore and discover new
high value data from massive
amounts of raw data
EXPLORATION
Top 3 Tactics for Modernization
• Barriers to adoption: Complex, slow, needs expertise
Kyvos Solution: Build a BI Consumption Layer on your Data Lake
• Enable business users to explore data visually and interactively
• No waiting for reports
• Self service – no learning curve
• No need to move data out of Hadoop
• Eliminate scalability restrictions for BI
• Drill down to lowest levels of granularity
Bridging the Gap for Business User
BI Consumption Layer with OLAP on Hadoop
BI Consumption Layer – Secure, Scalable Access for All Users
• Fine-grained access control
• Row and column level security
• Integration with kerberos, LDAP,
Active Directory
• Integration with security frameworks
• Role based access control
• Support for third party encryption
tools
• Support for single sign-on
Excel Spotfire
ICE JAVA
APP
MDX
Clients
Other Transformations
(Java / Scala)
Hive
HDFS
Jacobian Transformation
(Scala)
Impala SQL Server/ SSAS Spark
Business Need
• Evaluate risk across all asset classes
• Deliver interactive access at massive
scale
• Interface with Spotfire and in-house apps
• Reduce time to market
Challenges
• DATA SILOS – Teradata, SQL Server, and HDFS
• BIG DATA
• Data too large to look at all asset classes across desired time period
• 700 M transactions per day
• WEEKS – time to get results
• SLOW - response time to queries
Use Case
Investment Bank Risk Analysis
Excel Spotfire
ICE JAVA
APP
MDX
Clients
Other Transformations
(Java / Scala)
HDFS
Jacobian Transformation
(Scala)
KYVOS Spark
Solution Highlights
• One OLAP / caching layer for all three
UI’s: Excel, Spotfire, In-house
• Consolidated view of all asset classes
• Drill down to trade level – never possible
before
Results Obtained
• 20-day trend of risk – not achievable with previous Hive or Impala
solutions
• Daily updates of cubes
• Reduced time to market: eliminated need to move data to SSAS
• Interactive response times for users, even at massive scale
• No learning curve: support for all business UI’s
Use Case
Investment Bank Risk Analysis
• Can it deal with the scalability and granularity needed?
• How does it perform with “cold” queries for ad-hoc analysis?
• How efficiently does it deal with “warm” or repeated queries?
• Can business users access data seamlessly with their BI tools?
• Can diverse data sets be transformed and combined with no coding?
• Can it deal with incremental data updates efficiently?
• Can it deal with concurrent access without significant degradation?
• Is it enterprise ready to support availability and security requirements?
Evaluating Criteria
• Reduction in time to market
• Reduction in development time
• Increased business user productivity
• Reduced latency – reduced number of “hops” or diverse systems
supported
• Reduced operational costs
• Top-line benefits of insights that were not possible before
Measuring ROI
Visit us at Booth 1105
ajay.anand@kyvosinsights.com
vineet.tyagi@impetus.com
Q&A
Accelerating Data Warehouse Modernization

More Related Content

What's hot

Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationAnalytics8
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Big data introduction
Big data introductionBig data introduction
Big data introductionChirag Ahuja
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiTimothy Spann
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Legacy application modernization with microsoft azure
Legacy application modernization with microsoft azureLegacy application modernization with microsoft azure
Legacy application modernization with microsoft azureOptiSol Business Solutions
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Databaserockplace
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcarePerficient, Inc.
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 

What's hot (20)

Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
The Path to Data and Analytics Modernization
The Path to Data and Analytics ModernizationThe Path to Data and Analytics Modernization
The Path to Data and Analytics Modernization
 
Big data
Big dataBig data
Big data
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
IBM Cloud pak for data brochure
IBM Cloud pak for data   brochureIBM Cloud pak for data   brochure
IBM Cloud pak for data brochure
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Big data
Big dataBig data
Big data
 
Drone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFiDrone Data Flowing Through Apache NiFi
Drone Data Flowing Through Apache NiFi
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Legacy application modernization with microsoft azure
Legacy application modernization with microsoft azureLegacy application modernization with microsoft azure
Legacy application modernization with microsoft azure
 
Azure SQL Database
Azure SQL DatabaseAzure SQL Database
Azure SQL Database
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in Healthcare
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 

Viewers also liked

Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsDataWorks Summit/Hadoop Summit
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataDataWorks Summit/Hadoop Summit
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJDaniel Madrigal
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it DataWorks Summit/Hadoop Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success DataWorks Summit/Hadoop Summit
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondDataWorks Summit/Hadoop Summit
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaDataWorks Summit/Hadoop Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudDataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
Real Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with SparkReal Time Machine Learning Visualization with Spark
Real Time Machine Learning Visualization with Spark
 
Machine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of DataMachine Learning for Any Size of Data, Any Type of Data
Machine Learning for Any Size of Data, Any Type of Data
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJIntro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Extreme Analytics @ eBay
Extreme Analytics @ eBayExtreme Analytics @ eBay
Extreme Analytics @ eBay
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
From Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFiFrom Zero to Data Flow in Hours with Apache NiFi
From Zero to Data Flow in Hours with Apache NiFi
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 

Similar to Accelerating Data Warehouse Modernization

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the CloudHow Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the CloudTyler Wishnoff
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayAmazon Web Services
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)Moacyr Passador
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelDATAVERSITY
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Tyler Wishnoff
 

Similar to Accelerating Data Warehouse Modernization (20)

Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Retail & CPG
Retail & CPGRetail & CPG
Retail & CPG
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the CloudHow Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
How Analytics Teams Using SSAS Can Embrace Big Data and the Cloud
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Develop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBayDevelop a Custom Data Solution Architecture with NorthBay
Develop a Custom Data Solution Architecture with NorthBay
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data IBM Relay 2015: Open for Data
IBM Relay 2015: Open for Data
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
Assessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-ModelAssessing New Database Capabilities – Multi-Model
Assessing New Database Capabilities – Multi-Model
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Accelerating Data Warehouse Modernization

  • 1. Accelerating Data Warehouse Modernization Ajay Anand, VP Products, Kyvos Insights Vineet Tyagi, CTO, Impetus
  • 2. Our 40 Minutes Today • Drivers for Data Warehouse Modernization • What is a Modern Data Warehouse • Challenges for implementing a Modern Data Warehouse • Driving adoption and usage within the enterprise • Measuring success factors and ROI
  • 3. Data Warehouse Modernization – Drivers Optimize Existing DW/BI Infrastructure of Create New Capabilities Handle Big Data and the 3 V’s • Volume, Variety, Velocity Integrate Multiple Data Silos • ERP, CRM, HRM and others Reduce Cost • ETL process • Analytical process • Mainframe process • Cloud feasibility for data analytics Applying Science • Unstructured data for enhancing analytics • Data Science for advanced analytics Reduce Time to Market by Faster Processing Analytics
  • 4. Blueprint of a Modern Data Warehouse with Hadoop The enterprise data warehouse (EDW) and Hadoop based warehouse would co-exist to allow the enterprise to leverage the strengths of each architecture. Landing and ingestionStructured Unstructured External Social Machine Geospatial Time Series Streaming Provisioning,Workflow, Monitoring and Security Enterprise Data Lake Real-Time applications Predictive applications Exploration & discovery Enterprise applications Traditional data repositories RDBMS MPP
  • 5. Key Challenges for Modernization “Through 2018, 90% of modernized warehouses will be useless as they are overwhelmed with information assets captured for uncertain use cases”
  • 6. Key Challenges for Modernization “Visual data-discovery, an important enabler of end user self-service, will grow 2.5 x faster than the rest of the market, becoming by 2018 a requirement for all enterprises.” Making insights and data in the warehouse readily discoverable, accessible and usable
  • 7. Key Challenges for Modernization Is the opposite of “Dumb” data • hard to find • hard to understand • hard to combine Data in the Lake has to be Smart Rethink the information plumbing • Supplement first , transform later • Maximize ROI by protecting investments Rethink ETL – Light weight data blending tools that can allow for data wrangling when business cannot wait
  • 8. Key Challenges for Modernization “By 2017, most business users and analysts in organizations will have access to self-service tools to prepare data for analysis” “Managed BI Self-Service Will Continue to Close the Business and Technology Gap.” Self Service BI over Hadoop
  • 9. Using big data capabilities as a “landing zone” before determining what data should be moved to the data warehouse PRE-PROCESSING Moving infrequently accessed data from data warehouses into enterprise- grade Hadoop Moving associated workloads to be serviced from Hadoop OFFLOADING Using big data capabilities to explore and discover new high value data from massive amounts of raw data EXPLORATION Top 3 Tactics for Modernization
  • 10. • Barriers to adoption: Complex, slow, needs expertise Kyvos Solution: Build a BI Consumption Layer on your Data Lake • Enable business users to explore data visually and interactively • No waiting for reports • Self service – no learning curve • No need to move data out of Hadoop • Eliminate scalability restrictions for BI • Drill down to lowest levels of granularity Bridging the Gap for Business User
  • 11. BI Consumption Layer with OLAP on Hadoop
  • 12. BI Consumption Layer – Secure, Scalable Access for All Users • Fine-grained access control • Row and column level security • Integration with kerberos, LDAP, Active Directory • Integration with security frameworks • Role based access control • Support for third party encryption tools • Support for single sign-on
  • 13. Excel Spotfire ICE JAVA APP MDX Clients Other Transformations (Java / Scala) Hive HDFS Jacobian Transformation (Scala) Impala SQL Server/ SSAS Spark Business Need • Evaluate risk across all asset classes • Deliver interactive access at massive scale • Interface with Spotfire and in-house apps • Reduce time to market Challenges • DATA SILOS – Teradata, SQL Server, and HDFS • BIG DATA • Data too large to look at all asset classes across desired time period • 700 M transactions per day • WEEKS – time to get results • SLOW - response time to queries Use Case Investment Bank Risk Analysis
  • 14. Excel Spotfire ICE JAVA APP MDX Clients Other Transformations (Java / Scala) HDFS Jacobian Transformation (Scala) KYVOS Spark Solution Highlights • One OLAP / caching layer for all three UI’s: Excel, Spotfire, In-house • Consolidated view of all asset classes • Drill down to trade level – never possible before Results Obtained • 20-day trend of risk – not achievable with previous Hive or Impala solutions • Daily updates of cubes • Reduced time to market: eliminated need to move data to SSAS • Interactive response times for users, even at massive scale • No learning curve: support for all business UI’s Use Case Investment Bank Risk Analysis
  • 15. • Can it deal with the scalability and granularity needed? • How does it perform with “cold” queries for ad-hoc analysis? • How efficiently does it deal with “warm” or repeated queries? • Can business users access data seamlessly with their BI tools? • Can diverse data sets be transformed and combined with no coding? • Can it deal with incremental data updates efficiently? • Can it deal with concurrent access without significant degradation? • Is it enterprise ready to support availability and security requirements? Evaluating Criteria
  • 16. • Reduction in time to market • Reduction in development time • Increased business user productivity • Reduced latency – reduced number of “hops” or diverse systems supported • Reduced operational costs • Top-line benefits of insights that were not possible before Measuring ROI
  • 17. Visit us at Booth 1105 ajay.anand@kyvosinsights.com vineet.tyagi@impetus.com Q&A

Editor's Notes

  1. In a real world scenario, enterprise data warehouse (EDW) and Hadoop based warehouse would co-exist to allow the organization to leverage the strengths of each architecture to its advantage.
  2. Data Lake will have increasing amounts of data ingested at scale, if users don’t know is available, it will be useless They need to find it by different means, searches etc, with full governance, not canned queries. Discovery can happen of both data and its context and services When they find something they are interested in they should be immediately able to get it within the bounds of governance and work with it. Discoverability and accessibility have to go hand in hand, one builds on other but is not usable with other.
  3. Hard to find: Dumb data requires that we know the exact location of a particular piece of information we’re interested in. We may need to know a specific part number that acts as a primary key in a database or in a Hadoop cluster, or we may need to know particular internal IDs used to identify the same employee in three different systems. To cope with this, we wrap dumb data with basic keyword search or with canned queries—solutions that help us retrieve known data but don’t help us ask new questions or uncover new information. Hard to combine with other data: Dumb data is very provincial. It has identity and meaning within the confines of the particular silo in which it was created. Outside of that silo, however, dumb data is meaningless. An auto-incrementing integer key that uniquely identifies a customer within a CRM system is highly ambiguous when placed in the same context as data from a dozen other enterprise apps. A short text string such as “name” used to identify a particular data attribute within a key-value store such as MongoDB may collide with different attributes from other big data stores, databases, or spreadsheets when let loose in the wild. Hard to understand: Even once we find relevant information, we’re limited in our ability to understand dumb data as it is generally not well-described. Dumb data is described by database, table, and column names or document and key identifiers that are often short, opaque, and ambiguous outside of the context of a specific data store. For decades, we’ve been dealing with this by building software that has hardcoded knowledge of what data is in which column in which database table. Hardcoding this knowledge into every software layer from the query layer through to the business logic and all the way up to the user interface makes software very complex. Complex software is prone to bugs and is expensive and time-consuming to change, compromising our ability to deliver the most up-to-date and relevant data to business decision makers in a timely manner. Because most data is hard to find, combine with other data, and understand, its value ends up limited. The effort and cost needed to effectively use dumb data to drive business decisions is so high that we only use it for a few business problems, in particular those with static and predictable requirements. We might deploy a traditional BI tool to track and optimize widget sales by region, for instance, but we’re not able to apply the same analytic rigor to staffing client projects, understanding competitors’ strategies, providing proactive customer support, or any of hundreds of other day-to-day business activities that would benefit from a data-driven approach. If data is dumb, then big data is very dumb. With Hadoop and other big data infrastructure, we now have the tools to collect data at will in volumes and varieties not previously seen. However, this only exacerbates the challenges of finding, combining, and understanding the data we need at any given time. Add meaning The first step is to add meaning to your data by richly describing both the entities within your data and the relationships between them. Equally important to describing the meaning of data is where the meaning is described. Dumb data often has its meaning recorded via data dictionaries, service interface documents, relational database catalogs, or other out-of-band mechanisms. To make data smarter, don’t rely on the meaning of the data to be hardcoded within software; instead, link the meaning of the data directly to the data itself. There are several ways to describe data’s meaning, and the richer the description you choose, the smarter your data becomes. Data’s meaning can include: Controlled vocabularies that describe the acceptable values of an attribute Taxonomies that capture hierarchical relationships within the data Schemas that communicate data types, cardinalities, and min/max value ranges Ontologies that declaratively represent rich semantics of data such as transitive relationships, type intersections, or context-sensitive value restrictions There are two benefits of adding meaning to your data. First, software can respond appropriately (for example, by performing data validation or automatically choosing the right UI widget) to the meaning of different data sets without having to be customized for each. Second, rich data descriptions attached to data can empower business experts to manipulate data themselves without relying on scarce IT or data science personnel for every new dashboard, visualization, or analysis. Add context The Sisyphean pursuit of a generally unreachable “single version of the truth” belies the importance of context in making data smart enough to be discovered and understood by business decision makers. The lack of context makes data unreliable and hard to trust and decreases the chance that decision makers will rely on it. Just as meaning is traditionally captured separately from data, so, too, is contextual metadata usually divorced from the data it describes. To make data smarter, you must treat metadata as data. This means directly capturing and maintaining simple metadata such as the author or creation time of a piece of data, but it also means linking data to its full lineage, including the source of the data (e.g., a particular enterprise database, document, or social media posting) and any transformations on the data. Context can also include probability, confidence, and reliability metadata. Finally, data’s context might involve domain-specific attributes that limit the scope in which a particular piece of data is true, such as the time period for which a company had a particular ticker symbol or the position a patient was in when a blood pressure reading was obtained. By representing contextual metadata alongside the data itself, users can query, search, and visualize both at once. There’s no need to create separate, time-consuming data-load processes that select data from a particular time period or author. There’s no need to login to separate applications to verify the trustworthiness of data within a business intelligence dashboard. However, we work in an increasingly interconnected world, and we can’t ignore the edges of our big data clouds -- the points at which we exchange data with our supply chain partners, regulators, and customers. Adopting standards is critical to enabling reuse of data on these edges and to avoiding the overhead of classic point-to-point data translations that traditionally consume substantial resources when exchanging data with third parties. Smart data standards come in two varieties: Industry standards such as the Financial Industry Business Ontology (FIBO), CDISC in pharma, or HL7 in healthcare. These standards capture the meaning of data across an industry and ensure mutual understanding between organizations. Technology standards such as the semantic Web standards RDF, OWL, and SPARQL. These standards provide an agreed-upon way to model and describe the flexible, context-rich, data graphs that form the foundation of smart data.
  4. Self-Service Business Intelligence puts the power of analytics in the hands of end users to create their own reports and analysis of the data sets they want, on an as needed basis. The goal is to utilize data wrangling / blending and other capabilities to reduce IT’s involvement and expedite information to business users by delivering what Gartner refers to as “faster, more user-friendly and more relevant BI.” It is an evolutionary paradigm, does not indulge the IT vs/ Business divide , IT really gets to play the enabler here, reinforcing governance, structuring user autonomy, accounting for user differentiation, and transforming IT’s role from serving business to offering cross-functional support, it is important to realize that self-service BI should not be considered a replacement for traditional BI tools and warehousing. By utilizing a hybrid approach of centralized and decentralized models and restructuring the organization accordingly, self-service BI functions best as a supplement to the conventional methods in which data is accessed more expediently and put in the hands of those who need it most.
  5. Pre-Processing using big data capabilities as a “landing zone” before determining what data should be moved to the data warehouse Provide a landing zone for all data. Persist the data to provide a query able archive of cold data. Offloading moving infrequently accessed data from data warehouses into enterprise-grade Hadoop Moving associated workloads to be serviced from Hadoop Leverage Hadoop’s large-scale batch processing efficiencies to preprocess and transform data for the warehouse. Exploration Using big data capabilities to explore and discover new high value data from massive amounts of raw data and free up the data warehouse for more structured, deep analytics Enable an environment for ad hoc data discovery.