SlideShare a Scribd company logo
1 of 40
Download to read offline
THE GUIDE TO
BIG DATA
ANALYTICS
TABLE OF
CONTENTS 02 Why Big Data Analytics?
04 What is Big Data Analytics?
06 How has Big Data Analytics helped
companies?
17 How do I decide whether to buy or build?
21 If I build, what do I need?
25 How do I select the right Big Data Analytics
Solution for me?
27 Getting Successful with Big Data Analytics -
more than technology
38 Key Takeaways
39 Key Big Data Analytics Experts
01
Why
BIG DATA ANALYTICS?
Companies that use data to drive their business (in blue) perform
better than companies who do not.
WHY IS BIG DATA ANALYTICS SO IMPORTANT?
1. Data Drives Performance
Companies from all industries use big data analytics to:
• Increase revenue
• Decrease costs
• Increase productivity
2. Big Data Analytics Drives Results
$0
$43
$86
$129
$171
$214
$257
$300
12/31/09
03/31/10
06/30/10
09/30/10
12/31/10
03/31/11
06/30/11
09/30/11
12/31/11
03/31/12
06/30/12
09/30/12
12/31/12
03/21/13
Amazon vs Barnes & Noble
$0
$150
$300
$450
$600
$750
$900
12/31/09
03/31/10
06/30/10
09/30/10
12/31/10
03/31/11
06/30/11
09/30/11
12/31/11
03/31/12
06/30/12
09/30/12
12/31/12
03/21/13
Google vs Yahoo
$0
$10
$20
$30
$40
$50
$60
12/31/09
03/31/10
06/30/10
09/30/10
12/31/10
03/31/11
06/30/11
09/30/11
12/31/11
03/31/12
06/30/12
09/30/12
12/31/12
03/21/13
Cabela’s vs Big 5 Sporting Goods
2x 3x
12
weeks
to
3 days
INCREASE
in revenue
(Online Gaming
Company)
REDUCE TIME
to insight
INCREASE
in customer
acquisition
(Software Security
Company)
03
WHY BIG DATA ANALYTICS?
* Powered by Datameer
BIG DATA ANALYTICS?
What is
WHAT IS BIG DATA ANALYTICS AND
WHAT MAKES IT SO POWERFUL?
The Problem
05
WHAT IS BIG DATA ANALYTICS?
Before Hadoop, we had limited storage and compute, which led to a long and rigid analytics process (see below). First, IT goes through a lengthy process
(often known as ETL) to get every new data source ready to be stored. After getting the data ready, IT puts the data into a database or data warehouse, and
into a static data model. The problem with that approach is that IT designs the data model today with the knowledge of yesterday, and you have to hope that
it will be good enough for tomorrow. But nobody can predict the perfect schema. Then on top of that you put a business intelligence tool, which because of
the static schemas underneath, is optimized to answer KNOWN QUESTIONS.
TDWI says this 3-part process takes 18 months to implement or change. And on average it takes 3 months to integrate a new data source.
The business is telling us they cannot operate at this speed anymore. And there is a better way.
Data Warehouse Business IntelligenceETL
Raw Data Hadoop
Drag & Drop
Spreadsheet
18 months
less than 2 months
You need a solution with a different approach. First, you need a way to
bring in all the different data sources that is much faster than traditional
ETL. You need to be able load in raw data. That means loading all metadata
in the form it was generated — as a log file, database table, mainframe
copybook or a social media stream.
Second, you need a way to discover insights unencumbered by predefined
models. You need to be able to iterate with a widely used analysis tool,
such as a spreadsheet. Finally, you need a way to visualize those insights
and have those visualizations automatically updated. So as your business
changes, you can see and get ahead of those changes.
The Solution
Comprehensive
Integration
Schema Free
Analytics
Powerful
Visualizations
WHAT IS BIG DATA ANALYTICS AND
WHAT MAKES IT SO POWERFUL?
WHAT IS BIG DATA ANALYTICS?
06
07
BIG DATA ANALYTICS HELPED?
How has
08
BIG DATA ANALYTICS SUCCESSES
Optimize Funnel Conversion
Behavioral Analytics
Customer Segmentation
Predictive Support
Market Basket Analysis and
Pricing Optimization
Predict Security Threat
Fraud Detection
Increased customer conversion by 3x
2X Revenue
Decreased customer acquisition costs by 30%
Decreased customer churn
Reduced time to insight from 12 weeks to 3 days
Predict security threats within hours
Prevented $2B in potential fraud
Software Security
Online Gaming
Financial Services
Enterprise Storage
Retail
Software Security
Financial Services
Use Case Industry Benefit
HOW HAS BIG DATA ANALYTICS HELPED?
09
OPTIMIZE FUNNEL CONVERSION
HOW HAS BIG DATA ANALYTICS HELPED?
Identify roadblocks in
funnel conversion
Customer Conversion Increased by 60%
Generating traffic is good. But generating traffic
that leads to sales is better. Datameer has helped
companies identify which Google AdWords lead
To sales. By analyzing Google AdWords,
Salesforce, and Marketo data, this company was
able to track a lead from AdWord click through to
transaction. As a result, this company was able to
improve funnel conversion by 3x, leading to $20M
in additional revenue.
10
BEHAVIORAL ANALYTICS
HOW HAS BIG DATA ANALYTICS HELPED?
Improve game flow and
increase number of
paying customers
Increased revenue by 2x
User Profile
Game Event Logs
The game for gaming companies is to increase
customer acquisition, retention and
monetization. This means getting more users to
play, play more often and longer, and pay. First,
analysts use Datameer to identify common
characteristics of users. As a result, gaming
companies can target these users better with
the right advertising placement and content. To
increase retention, analysts use Datameer to
understand what gets a user to play longer. A
user who plays longer and interacts with other
players makes the overall gaming experience
better. To increase monetization, analysts use
Datameer to identify the group of users most
likely to pay based on common characteristics.
As a result of this analysis the company was
able to double their revenue to over $100M.
11
CUSTOMER SEGMENTATION
HOW HAS BIG DATA ANALYTICS HELPED?
Target promotions to reduce
customer acquisition costs
Customer acquisition costs 30%
$
Facebook Profiles
Transaction Data
With mushrooming customer acquisition costs, it
has become more important to target promotions
effectively. Now, information about a person
comes from social media sources in addition to
tradition transaction data. To analyze this data, this
customer used Datameer to correlate purchase
history, profile information and behavior on social
media sites. For example, they collected customer
profile data. Then they’d correlate this data with
transaction history and things the customers
“liked” on Facebook. In this way, the company
offered a special promotion to a person who liked
watching cooking programs and shopped
frequently at an organic foods store.
12
PREDICTIVE SUPPORT
HOW HAS BIG DATA ANALYTICS HELPED?
Identify operational failure
and address them before
they are reported
Reduced network failure by 30%
A couple of hours of downtime in a store or
production environment means lost revenue,
sometimes in the millions of dollars. The clues to
where downtime may occur are spread across
devices around a store or facility including WLAN
controllers, mobile devices, routers and firewall
devices. For this customer, these devices are used
to run operations such as tracking inventory. Each
network device generates enormous amounts of
machine-generated data. By using Datameer to
analyze all network data, this company was able to
detect potential network failures faster. As a result,
they were able to reduce the number of network
failures by 30%.
Store X
Store Y
Store Z
Historical
Inventory
Pricing
Transaction
13
MARKET BASKET ANALYSIS AND PRICING OPTIMIZATION
HOW HAS BIG DATA ANALYTICS HELPED?
Analyze pricing and
historical data to price
and advertise effectively
Report time from 12 weeks to 3 days
In retail, historical inventory, pricing and
transaction data are spread across multiple
devices and sources. Business users need to pull
together this information to understand
seasonality of products, come up with competitive
pricing, determine which platforms to support so
that their online users would have optimal
performance, and where to target ads. With
Datameer, these business users could do their
analysis in 3 days instead of 12 weeks with their
traditional tools and heavy IT involvement.
14
PREDICT SECURITY THREAT
HOW HAS BIG DATA ANALYTICS HELPED?
Identify where security
threats may occur
Predict security threats within hours
Feeds
Log Files
Firewall
The security landscape is always changing. So
changes in behavior can indicate where the next
attack may occur. For example, this company used
Datameer to follow a virus that started in Russia,
moved across Asia, to the US, and forcing
Windows upgrades in its path. By seeing where
the traffic was generated in particular geographic
areas, they could predict where the next security
threats would be. This enabled them to proactively
go after the security threats and reduce the risk of
a data breach, which on average costs
organizations $5.5M.
15
FRAUD DETECTION
HOW HAS BIG DATA ANALYTICS HELPED?
Identify potential fraud
Prevented $2B in fraud
Credit card fraud has changed. Instead of
stealing a credit card and using it to buy big ticket
items, some credit card thieves have become
more sophisticated. For example, they can now
making numerous, small transactions that are
seemingly benign. But if Joe is making 100 $5
margarita transactions at various locations,
something is wrong. By analyzing point of sale,
geolocation, authorization, and transaction data
with Datameer, this financial customer was able
to identify fraud patterns in historical data. This
analysis helped the firm identify $2B in fraud. By
applying the fraud model to new transactions, the
company was able to identify potential fraud and
proactively notify customers.
Point of Sale
Geo-location
Authorization
Transaction
16
WHETHER TO
BUY OR BUILD?
How do I
decide
QUESTIONS TO ASK
BUY VS. BUILD
We have done hundreds of implementation and found that in order to make a decision
for build vs. buy, the following questions are critical:
Questions to ask when considering the BUILD option:
How much time and money will it take to build a team that can
build the desired solution?
Does your organization have the time to hire the right people?
Can you afford to wait 1 year or more before you get insights?
Have you considered ongoing software maintenance costs?
Are you willing to wait 3 months to add a new data source?
Are you willing to manage 3 different teams (for integration,
analysis and visualization)?
How about all the communication time required to keep in
sync regarding changes in the project?
Questions to ask when considering the BUY option:
Have you defined your decision criteria?
(See Decision Criteria for selecting a big data analytics solution p.28)
Have you confirmed that the solution solves your
objectives and proof points?
Have you validated your choice? Have you done
reference calls?
Do you have the right people in place to deploy and use
this solution?
Will you need to train people?
What sort of project is my use case? Data application, discovery or reporting?
17
18
HOW LONG DOES IT TAKE TO BUILD A BIG DATA
ANALYTICS SOLUTION AS COMPARED TO BUYING ONE?
BUY VS. BUILD
Data loading
Data loading Analytics
Analyze data Visualize data Export data Secure data Monitoring
Transform data and
perform data quality
Scheduling and
dependencies
BUILD
BUY
14+ months
3 months
19
1
2
3Fewer steps in the process
Instead of hiring developers, ramping them up, defining,
developing, testing, deploying and then analyzing, you can go
directly to defining and analyzing.
Fewer technical requirements
Instead of manually building capabilities for data loading,
data parsing, data analytics, visualization, scheduling,
dependency management, data synchronization, monitoring
API, management UI, and security integration, you can have
it in one, purchased platform.
Prebuilt integration, analytics and
visualization functionality
ADVANTAGES OF BUYING VS. BUILDING
A BIG DATA ANALYTICS SOLUTION
BUY VS. BUILD
Seamless Data Integration
• Structured, semi and unstructured
• Pre-built connectors
• Connector plug-in API
Powerful Analytics
• Interactive spreadsheet UI
• Built-in analytic functions
• Macros and function plug-in API
Business Infographics
• Mash up anything, WYSIWYG
• Infographics and dashboards
• Visualization plug-in API
WHAT DO I NEED?
If I Build,
21
WHAT DO I NEED?
IF I BUILD...
Architecture Hardware
Functionality Process
22
Linux JVM Monitoring Hadoop Analytics
WHAT ARCHITECTURE DO I NEED?
IF I BUILD...
Software
Master/Secondary Slaves Network Application Server
Hardware
23
Data Loading — A software has to be developed to load
data from multiple, various data sources. This system
needs to deal with the distributed nature of Hadoop on
the one side and the non-distributed nature of the data
source. The system needs to deal with corrupted records
and need to provide monitoring services.
In your big data analytics solution, you will need to provide the following:
(see notes section for descriptions)
WHAT FUNCTIONALITY DO I NEED?
IF I BUILD...
Data Loading
Dependency Management — There are complex
dependencies that must be managed. For example,
certain data sets have to be loaded before certain
jobs in Hadoop can be run.
Dependency
Management
Data synchronization — Data often needs to be
pushed from Hadoop in to a data store like a
database or in-memory system.
Data
Synchronization
Monitoring API — Every aspect of a big data
analytics solution needs to be monitored. Things
that need to monitored include who has access to
the system, job health, performance, and data
throughput.
Monitoring API
Management UI — A management user interface
is critical for ease of configuration and monitoring.
Management UI
Security Integration — For security purposes, it is
important to be able to integrate with Kerberos and
LDAP.
Security
Integration
Data Parsing — Most data sources provide data in a
certain format that needs to be parsed into the
Hadoop system. For example, let’s consider parsing a
log file into records. Some formats are complicated to
parse like JSON where a record can be on many lines
of text and not just one line per record.
Data Parsing
Data Analytics — In order for data to be properly
analyzed, a big data analytics solution needs to
support rapid iterations.Data Analytics
Scheduling — All the items discussed above need to
be orchestrated and scheduled. Scheduling needs to
be easy to configure. In addition, the scheduling needs
have monitoring services to notify administrators of
jobs that fail.
Scheduling
Data Visualization — In order for an analyst to see
the insights, data needs to be visualized. Integrating
visualization is difficult because middleware needs to
be built to deliver the data out of Hadoop and into the
visualization layer.
Data
Visualization
24
It is very difficult to hire engineers
that are experienced with Hadoop.
Most already work for other big
companies and/or are very
expensive. Hadoop resources can
cost $300K/ year or more.
The first step is a design phase
and includes requirements
gathering (that will change later),
technical design, release and
resource planning.
In the implementation phase all
the “heavy lifting” development
work has to be done.
In this phase, as test
development is tested, test
engineers and a dedicated test
environment is required. This
often doubles hardware costs.
When building a system frequent
deployment into the production
system occurs. This means the
production system is frequently
down or another staging
environment is required.
When all design, develop, test and
deploy work is done then you can
finally start analyzing the data.
Most often you will find that people have
limited Hadoop experience, and you will need
to train them on the technology. This is a
significant investment in time and money. In
addition, practical Hadoop experience only
comes from running a Hadoop environment.
Hire Team
Train Team
on technology
Define Develop Test Deploy Analyze
In your big data analytics solution, you will need to provide the following:
WHAT PROCESS DO I NEED?
IF I BUILD...
How do I Select
THE RIGHT BIG DATA ANALYTICS
SOLUTION FOR ME?
26
Ease of use
Does the business need IT to Help?
Can a business analyst use the tool to do analysis?
Data Integration
Does system support native connectors to unstructured and semi-structured
data sources (e.g. email, log files, social, SaaS, machine data)? Does it
support flexible partitioning of the data so that it is easy to work with large
amounts of data? Does the solution support streaming of data so that users
will have the most current data? Does the solution provide data quality
functions so that the data can be quickly normalized and transformed?
Analytics
Does the solution provide an intuitive environment (e.g. spreadsheet) that
business users can quickly use? Does the solution include pre-built analytic
functions? Does the system provide a preview to validate analysis and show
data lineage for auditing data flows? Do data models have to be defined
before insights can be gained? Do analysts need to know what they want to
do before they have had a chance to look at the data? Or can analysts look at
the data, iterate, make the changes they need and analyze without involving
IT?
Visualizations
Does the solution support complete freeform visualization? Or is it just a
combination of reports and dashboards?
Integration with existing IT infrastructure
Does the solution support import from and export into other BI systems?
Administration
Does the system support flexible security integration with LDAP and
ActiveDirectory?
Extensibility
Does the solution support open API’s for custom data connections and
custom visualizations?
Architecture
Does the solution run natively on Hadoop? Is a separate cluster required
(ideally no separate cluster should be required)? Are there memory
constraints, or is the product limited by the availability of the memory within
the nodes? The ideal solution should have no memory constraints as that
limits the interactivity in analysis and leads to distortion of insights. Does the
solution provide a job planner and optimizer to ensure the lowest number of
MapReduce jobs is executed?
Vendor requirements
Is the system proven, does it have numerous major releases? How much is it in
enterprise use, has it analyze substantial data, (terabytes, petabytes or
exabytes)?
HOW TO SELECT THE RIGHT BIG DATA ANALYTICS SOLUTION
After working with hundreds of customers, we’ve found the following are the top
things to look for in a Big Data Analytics Solution.
DECISION CRITERIA FOR SELECTING
A BIG DATA ANALYTICS SOLUTION
Getting
Successful
WITH BIG DATA
ANALYTICS
28
Identify
Use Cases
Create
Hypothesis
Find Insight
Show
Value to
Management
Iterate
Sharpen
Insight
Measure
& ROI
GETTING SUCCESSFUL WITH BIG DATA ANALYTICS
As you go through the steps of creating a successful project plan, IT and the Business
must work hand-in-hand.
Here is an overview of the Business and IT best practices. We’ll go into the Business best practices in more detail first and then the IT
best practices.
CREATING A SUCCESSFUL PROJECT PLAN
Business
Identify
the Team
Size
Hardware Train
Identify/
Ingest Data
Deploy
Disaster
Recovery
IT
BEST PRACTICES
for the Business
30
BEST PRACTICES FOR THE BUSINESS
IDENTIFY THE USE CASES AND THE TEAM
Are you trying to:
1. Optimize the funnel?
2. Perform behavioral analytics?
3. Perform customer segmentation?
4. Predict support issues?
5. Analyze credit risk?
6. Perform market basket analysis and
optimize pricing?
7. Automate support?
8. Detect and prevent security threats?
9. Detect and prevent fraud?
Identify the Use Cases
Who is the:
1. Budget owner?
2. Project manager?
3. Data owner?
4. Subject matter expert?
5. Business analyst and users?
Identify the Team
1. Tackle low hanging fruit first for
a quick win to show fast ROI
2. Choose the project that will
provide the biggest potential
value
Data Analyst Best Practices
31
BEST PRACTICES FOR THE BUSINESS
BUDGET SO YOU CAN SELL INTERNALLY
To successfully sell internally, answer
the following questions:
What’s my total cost of ownership?
(Contact marketing@datameer.com for more information on this analysis).
Have I considered and budgeted for:
• People?
• Software Acquisition, Support?
• Hardware (Server, network)?
• Logistics (Hardware / people)?
• Operations/Data Center cost (power, cooling, etc.)?
It is critical to sell your project
internally not just at the beginning
of the project but continuously.
Especially as Big Data Analytics is new and top of mind for
many executives and management, selling internally will pay
off in the short and long term.
To sell internally, it will be important to:
Identify the sponsor
Build a business case
Define your metrics for success
Evangelize big data in business terms
Identify and communicate the risk of not doing big data
Show that the competition is already doing it
Start small, show the results fast
Communicate risks clearly
Demonstrate the use case through pilots
Involve the consumers of data, not just producers, early
32
• Big Data Analytics gets better and more useful the more you use it.
So iterating has been critical to long term success for our customers.
• Start small and build on your success
• Don’t boil the ocean
BEST PRACTICES FOR THE BUSINESS
BIG DATA ANALYTICS IS AN ITERATIVE PROCESS
Analyze
Create your
hypothesis
Act on insight and
measure ROI
Visualize
33
BEST PRACTICES for IT
Do I have an I/O or CPU intensive workload?
Do I really need to evaluate the Hadoop Distribution vendors?
Have I considered power consumption?
Have I considered cooling needs?
Have I considered the weight of hardware?
Does our data center support it?
Start with a small cluster. You can always buy more later.
Consider using a cloud provider like Amazon/Rackspace for fast provisioning
Time to insight matters more than time per query!
Hadoop does 3x data replication, no RAID needed
Hadoop stores intermediate results – leave twice as much storage headroom
as you want to analyze
IDENTIFY THE HARDWARE YOU NEED
BEST PRACTICES FOR IT
Master/Secondary Slaves
Network Application Server
• Fault Tolerant
• RAID
• Dual Network Power, etc.
• 4+ HDD
• 1 CPU core per HDD
• 2GB RAM/core
• 1GB fully unblocked network switches
• 10GB uplink for Top-of-Rack switches
• Dual Network Power, etc.
• 2 HDD’s
• 16GB RAM
34
Questions to ask: Best Practices:
1. What data do I have available?
2. What’s missing that I’d like to use?
3. What data elements can I use to link this data with existing data?
4. Will the data be pushed or pulled into Datameer and Hadoop?
5. What is the performance expectation?
6. What type and level of security is needed? Are you looking for data masking or anonymization?
7. What is the data retention policy?
8. What is the data management strategy?
Answering the following will help identify and aggregate the right data sources
in particular:
BEST PRACTICES FOR IT
IDENTIFY AND AGGREGATE YOUR DATA SOURCES
35
36
1. Convert your discovery into scheduled reports and run as needed
2. Measure your improvements
3. Continuously monitor results
4. Share and collaborate
Now that you have the insights, let’s automate the process and make creating
these impactful insights repeatable. To do that, the following are important to
incorporate into your plan.
BEST PRACTICES FOR IT
PUT YOUR PLAN IN ACTION & DEPLOY
Measure your
improvements
Convert your
discovery into
scheduled reports
Share and collaborate
Continuously
monitor results
37
1. Set up workshops instead of lectures
2. Educate and empower users on their own data and systems
3. Encourage users to think outside the box
4. Enable users to access external data sources
5. Inspire them to be creative with data discovery
Enabling your organization, across the different functions and business
stakeholders, can be challenging. The following things have helped with the
customers we worked with:
BEST PRACTICES FOR IT
ENABLEMENT
Land and expand. Start small and go after the low hanging fruit. Then build upon
those successes to go broader.
A successful Big Data Analytics solution makes it easier and faster to analyze broader sets of data. You need a solution that:
CONCLUSION
KEY TAKEAWAYS
1
2
Is faster because it has fewer steps in the process.
Instead of hiring developers, ramping them up, defining, developing,
testing, deploying and then analyzing, you can start with defining and
analyzing.
Has fewer technical requirements
Instead of manually building capabilities for data loading, data parsing,
data analytics, data visualization, scheduling, dependency
management, data synchronization, monitoring API, management UI,
and security integration, you can have it all in one platform. Buying the
right solution also means you don’t have force fit a new technology
into old architecture.
3 Prebuilt integration, analytics, dashboards
Data integration – Quickly ingesting data from many different types of data
sources is critical to the success of a big data project. Eliminating the IT
intensive ETL and modeling processes is critical to allowing analysts to be able
to meaningfully interact with data and to prevent long and expensive waits while
data is prepared. To achieve this, a solution should have a large set of native
connectors for the integration of all types of disparate data — structured,
unstructured and semi-structured (e.g. databases, file systems, social media,
mainframe, SaaS applications, XML, JSON).
Analytics – Analysts need a tool to work with the data, turn it into a form that
can be usable and then perform the analysis. The means having pre built
transformations to clean up the data so that is usable as well as prebuilt
analytic functions. These functions should include ones to support unstructured
data and sentiment analysis. This tool should also provide a preview to validate
analysis and show data lineage for auditing data flows.
Visualization – Beyond static dashboards, analysts need Infographics
where data visualizations have no built-in constraints. Users need to be able to
drag and top any widget, graphic, text, dashboard or info graphic element as
needed or desired.
38
39
STEFAN GROSCHUPF
CEO
FRANK HENZE
VP, Product Management
PETER VOSS
CTO
EDUARDO ROSAS
VP, Services
Stefan Groschupf is the co-founder
and CEO of Datameer, the only
end-to-end data integration, analysis
and visualization platform for big data
analytics on Hadoop.
Frank Henze is Vice President of
Product Management at Datameer with
over a decade of experience in building
enterprise software systems.
Peter Voss is CTO at Datameer with
extensive experience in software
engineering and architecture of
large-scale data processing.
Eduardo Rosas is VP of Services and
heads the implementation of Big Data
Analytics projects that triple customer
conversion rates, lead to over $20M in
additional revenue, and cut IT costs of
big data analytics projects in half.
KEN KRUGLER
President, Scale Unlimited
RON BODKIN
CEO, Think Big Analytics
JOE CASERTA
CEO, Caserta Concepts
PHIL SHELLEY
Former CTO, Sears
Consulting and training for big data
processing and web mining problems,
using Hadoop, Cascading, Cassandra
and Solr. Apache Software Foundation
member, committer for Tika open
source content extraction toolkit.
Ron founded Think Big to help
companies realize measurable value from
Big Data. Previously, Ron was VP
Engineering at Quantcast where he led
the data science and engineer teams that
pioneered the use of Hadoop and NoSQL
for batch and real-time decision making.
Joe Caserta is a veteran solution
provider, educator, and President of
Caserta Concepts, a specialized data
warehousing, business intelligence and big
data analytics consulting firm.
Dr. Shelley advises companies on
designing, delivering and operating
Hadoop-based solutions for Analytics,
Mainframe Migration and
massive-scale processing. He was
CTO at Sears Holdings Corporation
(SHC) and lead IT Operations
focusing on the modernization of IT
across the company.
You’re not alone!
Combining decades of experience in Hadoop, data management and business intelligence, there are people who have been working with customers around the
world, across industries such as Communications, Financial Services, Retail, Internet Security, Consumer Packaged Goods, Transportation, Logistics, and deep
into use cases such as fraud detection, behavioral analytics, customer segmentation, pricing optimization and credit risk analysis.
We are happy to help, so please reach out to the following experts with any questions or thoughts you might have. Also please send any questions, comments or
thoughts on this guide to khsu@datameer.com
RESOURCES
EXPERTS IN BIG DATA ANALYTICS

More Related Content

What's hot

Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 
Structured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebookStructured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebookEmcien Corporation
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsShilpaKrishna6
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Krishnaram Kenthapadi
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Danish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML OpsDanish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML OpsNeo4j
 
Getting started with Tableau
Getting started with TableauGetting started with Tableau
Getting started with TableauParth Acharya
 
Sustainability data and analytics on cloud based M2M systems
Sustainability data and  analytics on cloud based M2M systemsSustainability data and  analytics on cloud based M2M systems
Sustainability data and analytics on cloud based M2M systemsThomsonTG1
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learningIvo Andreev
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learningbutest
 
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 

What's hot (20)

Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Big data
Big dataBig data
Big data
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Structured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebookStructured and Unstructured Big Data ebook
Structured and Unstructured Big Data ebook
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Danish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML OpsDanish Business Authority: Explainability and causality in relation to ML Ops
Danish Business Authority: Explainability and causality in relation to ML Ops
 
Getting started with Tableau
Getting started with TableauGetting started with Tableau
Getting started with Tableau
 
Sustainability data and analytics on cloud based M2M systems
Sustainability data and  analytics on cloud based M2M systemsSustainability data and  analytics on cloud based M2M systems
Sustainability data and analytics on cloud based M2M systems
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Machine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree LearningMachine Learning 3 - Decision Tree Learning
Machine Learning 3 - Decision Tree Learning
 
Ai in government
Ai in government Ai in government
Ai in government
 
Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
3 data visualization
3 data visualization3 data visualization
3 data visualization
 

Viewers also liked

BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 
mLearning planning tools and qrcodes
mLearning planning tools and qrcodesmLearning planning tools and qrcodes
mLearning planning tools and qrcodesInge de Waard
 
Gestión Básica de la información
Gestión Básica de la información Gestión Básica de la información
Gestión Básica de la información Iveth Andrea
 
Case study for incidence tracking system – Times group
Case study for incidence tracking system – Times groupCase study for incidence tracking system – Times group
Case study for incidence tracking system – Times groupMike Taylor
 
iPhone 6 Plus vs Amazon Fire Phone
iPhone 6 Plus vs Amazon Fire PhoneiPhone 6 Plus vs Amazon Fire Phone
iPhone 6 Plus vs Amazon Fire PhoneUBMCanon
 
Weather Underground - PWS, Data Ingestion and APIs
Weather Underground - PWS, Data Ingestion and APIsWeather Underground - PWS, Data Ingestion and APIs
Weather Underground - PWS, Data Ingestion and APIsRavi Yadav
 
Ts Pws Biogasupgrading Bremen 2010 V1
Ts Pws Biogasupgrading Bremen 2010 V1Ts Pws Biogasupgrading Bremen 2010 V1
Ts Pws Biogasupgrading Bremen 2010 V1rlems
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practicesPiyush Malik
 
WRL - Investor Deck - July 2014
WRL - Investor Deck - July 2014WRL - Investor Deck - July 2014
WRL - Investor Deck - July 2014Sanjukt Saha
 
Ibm connections 5.0 installation step-by-step (windows and tds)
Ibm connections 5.0   installation step-by-step (windows and tds)Ibm connections 5.0   installation step-by-step (windows and tds)
Ibm connections 5.0 installation step-by-step (windows and tds)Fuangwith Sopharath
 
IBM Connections 4.5 CR2 Installation - From Zero To Social Hero - 2.02 - with...
IBM Connections 4.5 CR2 Installation - From Zero To Social Hero - 2.02 - with...IBM Connections 4.5 CR2 Installation - From Zero To Social Hero - 2.02 - with...
IBM Connections 4.5 CR2 Installation - From Zero To Social Hero - 2.02 - with...Frank Altenburg
 

Viewers also liked (20)

Big Data
Big DataBig Data
Big Data
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
mLearning planning tools and qrcodes
mLearning planning tools and qrcodesmLearning planning tools and qrcodes
mLearning planning tools and qrcodes
 
Gestión Básica de la información
Gestión Básica de la información Gestión Básica de la información
Gestión Básica de la información
 
Case study for incidence tracking system – Times group
Case study for incidence tracking system – Times groupCase study for incidence tracking system – Times group
Case study for incidence tracking system – Times group
 
iPhone 6 Plus vs Amazon Fire Phone
iPhone 6 Plus vs Amazon Fire PhoneiPhone 6 Plus vs Amazon Fire Phone
iPhone 6 Plus vs Amazon Fire Phone
 
LG User Guide Upgrade Tool
LG User Guide Upgrade ToolLG User Guide Upgrade Tool
LG User Guide Upgrade Tool
 
Weather Underground - PWS, Data Ingestion and APIs
Weather Underground - PWS, Data Ingestion and APIsWeather Underground - PWS, Data Ingestion and APIs
Weather Underground - PWS, Data Ingestion and APIs
 
Ts Pws Biogasupgrading Bremen 2010 V1
Ts Pws Biogasupgrading Bremen 2010 V1Ts Pws Biogasupgrading Bremen 2010 V1
Ts Pws Biogasupgrading Bremen 2010 V1
 
Governing Big Data : Principles and practices
Governing Big Data : Principles and practicesGoverning Big Data : Principles and practices
Governing Big Data : Principles and practices
 
WRL - Investor Deck - July 2014
WRL - Investor Deck - July 2014WRL - Investor Deck - July 2014
WRL - Investor Deck - July 2014
 
Ibm connections 5.0 installation step-by-step (windows and tds)
Ibm connections 5.0   installation step-by-step (windows and tds)Ibm connections 5.0   installation step-by-step (windows and tds)
Ibm connections 5.0 installation step-by-step (windows and tds)
 
Photoshop
PhotoshopPhotoshop
Photoshop
 
IBM Connections 4.5 CR2 Installation - From Zero To Social Hero - 2.02 - with...
IBM Connections 4.5 CR2 Installation - From Zero To Social Hero - 2.02 - with...IBM Connections 4.5 CR2 Installation - From Zero To Social Hero - 2.02 - with...
IBM Connections 4.5 CR2 Installation - From Zero To Social Hero - 2.02 - with...
 
Oracle's BigData solutions
Oracle's BigData solutionsOracle's BigData solutions
Oracle's BigData solutions
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 

Similar to Big data-analytics-ebook

Is Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big DataIs Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big Datahimanshu13jun
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Kavika Roy
 
Malaysia Presentation
Malaysia PresentationMalaysia Presentation
Malaysia PresentationAlan Royal
 
Analytics solution
Analytics solutionAnalytics solution
Analytics solutioncamssguide
 
How Companies Turn Data Into Business Value
 How Companies Turn Data Into Business Value How Companies Turn Data Into Business Value
How Companies Turn Data Into Business ValueJamie Hribal
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...Dr. Cedric Alford
 
360i Report: Big Data
360i Report: Big Data360i Report: Big Data
360i Report: Big Data360i
 
Delivering customer value faster with Big Data analytics
Delivering customer value faster with Big Data analyticsDelivering customer value faster with Big Data analytics
Delivering customer value faster with Big Data analyticsThe Marketing Distillery
 
Data-Analytics-Resource-updated for analysis
Data-Analytics-Resource-updated for analysisData-Analytics-Resource-updated for analysis
Data-Analytics-Resource-updated for analysisBhavinGada5
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Data set The Future of Big Data
Data set The Future of Big DataData set The Future of Big Data
Data set The Future of Big DataData-Set
 

Similar to Big data-analytics-ebook (20)

Big data-analytics-ebook
Big data-analytics-ebookBig data-analytics-ebook
Big data-analytics-ebook
 
Is Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big DataIs Your Company Braced Up for handling Big Data
Is Your Company Braced Up for handling Big Data
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Bidata
BidataBidata
Bidata
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!
 
Malaysia Presentation
Malaysia PresentationMalaysia Presentation
Malaysia Presentation
 
The big returns from Big Data
The big returns from Big DataThe big returns from Big Data
The big returns from Big Data
 
Analytics solution
Analytics solutionAnalytics solution
Analytics solution
 
How Companies Turn Data Into Business Value
 How Companies Turn Data Into Business Value How Companies Turn Data Into Business Value
How Companies Turn Data Into Business Value
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
 
360i Report: Big Data
360i Report: Big Data360i Report: Big Data
360i Report: Big Data
 
new.pptx
new.pptxnew.pptx
new.pptx
 
Delivering customer value faster with Big Data analytics
Delivering customer value faster with Big Data analyticsDelivering customer value faster with Big Data analytics
Delivering customer value faster with Big Data analytics
 
Data-Analytics-Resource-updated for analysis
Data-Analytics-Resource-updated for analysisData-Analytics-Resource-updated for analysis
Data-Analytics-Resource-updated for analysis
 
Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
Data set The Future of Big Data
Data set The Future of Big DataData set The Future of Big Data
Data set The Future of Big Data
 
Transforming Big Data into business value
Transforming Big Data into business valueTransforming Big Data into business value
Transforming Big Data into business value
 
6 Reasons to Use Data Analytics
6 Reasons to Use Data Analytics6 Reasons to Use Data Analytics
6 Reasons to Use Data Analytics
 

More from Shubhashish Biswas

More from Shubhashish Biswas (13)

Wphab2bintgxs101707 090314225022-phpapp01
Wphab2bintgxs101707 090314225022-phpapp01Wphab2bintgxs101707 090314225022-phpapp01
Wphab2bintgxs101707 090314225022-phpapp01
 
W acto07
W acto07W acto07
W acto07
 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822
 
Predictivemodelingwhitepaper
PredictivemodelingwhitepaperPredictivemodelingwhitepaper
Predictivemodelingwhitepaper
 
Predictivemodelingwhitepaper 2
Predictivemodelingwhitepaper 2Predictivemodelingwhitepaper 2
Predictivemodelingwhitepaper 2
 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paper
 
Predictive analytics 2025_br
Predictive analytics 2025_brPredictive analytics 2025_br
Predictive analytics 2025_br
 
Hurwitz sas victory_report
Hurwitz sas victory_reportHurwitz sas victory_report
Hurwitz sas victory_report
 
Ebookblogv2 120116015321-phpapp01
Ebookblogv2 120116015321-phpapp01Ebookblogv2 120116015321-phpapp01
Ebookblogv2 120116015321-phpapp01
 
Dm
DmDm
Dm
 
Customer analytics for dummies
Customer analytics for dummiesCustomer analytics for dummies
Customer analytics for dummies
 
1 kwyfvb
1 kwyfvb1 kwyfvb
1 kwyfvb
 
V imp analytics appli ed
V imp analytics appli edV imp analytics appli ed
V imp analytics appli ed
 

Recently uploaded

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Big data-analytics-ebook

  • 1. THE GUIDE TO BIG DATA ANALYTICS
  • 2. TABLE OF CONTENTS 02 Why Big Data Analytics? 04 What is Big Data Analytics? 06 How has Big Data Analytics helped companies? 17 How do I decide whether to buy or build? 21 If I build, what do I need? 25 How do I select the right Big Data Analytics Solution for me? 27 Getting Successful with Big Data Analytics - more than technology 38 Key Takeaways 39 Key Big Data Analytics Experts 01
  • 4. Companies that use data to drive their business (in blue) perform better than companies who do not. WHY IS BIG DATA ANALYTICS SO IMPORTANT? 1. Data Drives Performance Companies from all industries use big data analytics to: • Increase revenue • Decrease costs • Increase productivity 2. Big Data Analytics Drives Results $0 $43 $86 $129 $171 $214 $257 $300 12/31/09 03/31/10 06/30/10 09/30/10 12/31/10 03/31/11 06/30/11 09/30/11 12/31/11 03/31/12 06/30/12 09/30/12 12/31/12 03/21/13 Amazon vs Barnes & Noble $0 $150 $300 $450 $600 $750 $900 12/31/09 03/31/10 06/30/10 09/30/10 12/31/10 03/31/11 06/30/11 09/30/11 12/31/11 03/31/12 06/30/12 09/30/12 12/31/12 03/21/13 Google vs Yahoo $0 $10 $20 $30 $40 $50 $60 12/31/09 03/31/10 06/30/10 09/30/10 12/31/10 03/31/11 06/30/11 09/30/11 12/31/11 03/31/12 06/30/12 09/30/12 12/31/12 03/21/13 Cabela’s vs Big 5 Sporting Goods 2x 3x 12 weeks to 3 days INCREASE in revenue (Online Gaming Company) REDUCE TIME to insight INCREASE in customer acquisition (Software Security Company) 03 WHY BIG DATA ANALYTICS? * Powered by Datameer
  • 6. WHAT IS BIG DATA ANALYTICS AND WHAT MAKES IT SO POWERFUL? The Problem 05 WHAT IS BIG DATA ANALYTICS? Before Hadoop, we had limited storage and compute, which led to a long and rigid analytics process (see below). First, IT goes through a lengthy process (often known as ETL) to get every new data source ready to be stored. After getting the data ready, IT puts the data into a database or data warehouse, and into a static data model. The problem with that approach is that IT designs the data model today with the knowledge of yesterday, and you have to hope that it will be good enough for tomorrow. But nobody can predict the perfect schema. Then on top of that you put a business intelligence tool, which because of the static schemas underneath, is optimized to answer KNOWN QUESTIONS. TDWI says this 3-part process takes 18 months to implement or change. And on average it takes 3 months to integrate a new data source. The business is telling us they cannot operate at this speed anymore. And there is a better way. Data Warehouse Business IntelligenceETL Raw Data Hadoop Drag & Drop Spreadsheet 18 months less than 2 months
  • 7. You need a solution with a different approach. First, you need a way to bring in all the different data sources that is much faster than traditional ETL. You need to be able load in raw data. That means loading all metadata in the form it was generated — as a log file, database table, mainframe copybook or a social media stream. Second, you need a way to discover insights unencumbered by predefined models. You need to be able to iterate with a widely used analysis tool, such as a spreadsheet. Finally, you need a way to visualize those insights and have those visualizations automatically updated. So as your business changes, you can see and get ahead of those changes. The Solution Comprehensive Integration Schema Free Analytics Powerful Visualizations WHAT IS BIG DATA ANALYTICS AND WHAT MAKES IT SO POWERFUL? WHAT IS BIG DATA ANALYTICS? 06
  • 8. 07 BIG DATA ANALYTICS HELPED? How has
  • 9. 08 BIG DATA ANALYTICS SUCCESSES Optimize Funnel Conversion Behavioral Analytics Customer Segmentation Predictive Support Market Basket Analysis and Pricing Optimization Predict Security Threat Fraud Detection Increased customer conversion by 3x 2X Revenue Decreased customer acquisition costs by 30% Decreased customer churn Reduced time to insight from 12 weeks to 3 days Predict security threats within hours Prevented $2B in potential fraud Software Security Online Gaming Financial Services Enterprise Storage Retail Software Security Financial Services Use Case Industry Benefit HOW HAS BIG DATA ANALYTICS HELPED?
  • 10. 09 OPTIMIZE FUNNEL CONVERSION HOW HAS BIG DATA ANALYTICS HELPED? Identify roadblocks in funnel conversion Customer Conversion Increased by 60% Generating traffic is good. But generating traffic that leads to sales is better. Datameer has helped companies identify which Google AdWords lead To sales. By analyzing Google AdWords, Salesforce, and Marketo data, this company was able to track a lead from AdWord click through to transaction. As a result, this company was able to improve funnel conversion by 3x, leading to $20M in additional revenue.
  • 11. 10 BEHAVIORAL ANALYTICS HOW HAS BIG DATA ANALYTICS HELPED? Improve game flow and increase number of paying customers Increased revenue by 2x User Profile Game Event Logs The game for gaming companies is to increase customer acquisition, retention and monetization. This means getting more users to play, play more often and longer, and pay. First, analysts use Datameer to identify common characteristics of users. As a result, gaming companies can target these users better with the right advertising placement and content. To increase retention, analysts use Datameer to understand what gets a user to play longer. A user who plays longer and interacts with other players makes the overall gaming experience better. To increase monetization, analysts use Datameer to identify the group of users most likely to pay based on common characteristics. As a result of this analysis the company was able to double their revenue to over $100M.
  • 12. 11 CUSTOMER SEGMENTATION HOW HAS BIG DATA ANALYTICS HELPED? Target promotions to reduce customer acquisition costs Customer acquisition costs 30% $ Facebook Profiles Transaction Data With mushrooming customer acquisition costs, it has become more important to target promotions effectively. Now, information about a person comes from social media sources in addition to tradition transaction data. To analyze this data, this customer used Datameer to correlate purchase history, profile information and behavior on social media sites. For example, they collected customer profile data. Then they’d correlate this data with transaction history and things the customers “liked” on Facebook. In this way, the company offered a special promotion to a person who liked watching cooking programs and shopped frequently at an organic foods store.
  • 13. 12 PREDICTIVE SUPPORT HOW HAS BIG DATA ANALYTICS HELPED? Identify operational failure and address them before they are reported Reduced network failure by 30% A couple of hours of downtime in a store or production environment means lost revenue, sometimes in the millions of dollars. The clues to where downtime may occur are spread across devices around a store or facility including WLAN controllers, mobile devices, routers and firewall devices. For this customer, these devices are used to run operations such as tracking inventory. Each network device generates enormous amounts of machine-generated data. By using Datameer to analyze all network data, this company was able to detect potential network failures faster. As a result, they were able to reduce the number of network failures by 30%. Store X Store Y Store Z
  • 14. Historical Inventory Pricing Transaction 13 MARKET BASKET ANALYSIS AND PRICING OPTIMIZATION HOW HAS BIG DATA ANALYTICS HELPED? Analyze pricing and historical data to price and advertise effectively Report time from 12 weeks to 3 days In retail, historical inventory, pricing and transaction data are spread across multiple devices and sources. Business users need to pull together this information to understand seasonality of products, come up with competitive pricing, determine which platforms to support so that their online users would have optimal performance, and where to target ads. With Datameer, these business users could do their analysis in 3 days instead of 12 weeks with their traditional tools and heavy IT involvement.
  • 15. 14 PREDICT SECURITY THREAT HOW HAS BIG DATA ANALYTICS HELPED? Identify where security threats may occur Predict security threats within hours Feeds Log Files Firewall The security landscape is always changing. So changes in behavior can indicate where the next attack may occur. For example, this company used Datameer to follow a virus that started in Russia, moved across Asia, to the US, and forcing Windows upgrades in its path. By seeing where the traffic was generated in particular geographic areas, they could predict where the next security threats would be. This enabled them to proactively go after the security threats and reduce the risk of a data breach, which on average costs organizations $5.5M.
  • 16. 15 FRAUD DETECTION HOW HAS BIG DATA ANALYTICS HELPED? Identify potential fraud Prevented $2B in fraud Credit card fraud has changed. Instead of stealing a credit card and using it to buy big ticket items, some credit card thieves have become more sophisticated. For example, they can now making numerous, small transactions that are seemingly benign. But if Joe is making 100 $5 margarita transactions at various locations, something is wrong. By analyzing point of sale, geolocation, authorization, and transaction data with Datameer, this financial customer was able to identify fraud patterns in historical data. This analysis helped the firm identify $2B in fraud. By applying the fraud model to new transactions, the company was able to identify potential fraud and proactively notify customers. Point of Sale Geo-location Authorization Transaction
  • 17. 16 WHETHER TO BUY OR BUILD? How do I decide
  • 18. QUESTIONS TO ASK BUY VS. BUILD We have done hundreds of implementation and found that in order to make a decision for build vs. buy, the following questions are critical: Questions to ask when considering the BUILD option: How much time and money will it take to build a team that can build the desired solution? Does your organization have the time to hire the right people? Can you afford to wait 1 year or more before you get insights? Have you considered ongoing software maintenance costs? Are you willing to wait 3 months to add a new data source? Are you willing to manage 3 different teams (for integration, analysis and visualization)? How about all the communication time required to keep in sync regarding changes in the project? Questions to ask when considering the BUY option: Have you defined your decision criteria? (See Decision Criteria for selecting a big data analytics solution p.28) Have you confirmed that the solution solves your objectives and proof points? Have you validated your choice? Have you done reference calls? Do you have the right people in place to deploy and use this solution? Will you need to train people? What sort of project is my use case? Data application, discovery or reporting? 17
  • 19. 18 HOW LONG DOES IT TAKE TO BUILD A BIG DATA ANALYTICS SOLUTION AS COMPARED TO BUYING ONE? BUY VS. BUILD Data loading Data loading Analytics Analyze data Visualize data Export data Secure data Monitoring Transform data and perform data quality Scheduling and dependencies BUILD BUY 14+ months 3 months
  • 20. 19 1 2 3Fewer steps in the process Instead of hiring developers, ramping them up, defining, developing, testing, deploying and then analyzing, you can go directly to defining and analyzing. Fewer technical requirements Instead of manually building capabilities for data loading, data parsing, data analytics, visualization, scheduling, dependency management, data synchronization, monitoring API, management UI, and security integration, you can have it in one, purchased platform. Prebuilt integration, analytics and visualization functionality ADVANTAGES OF BUYING VS. BUILDING A BIG DATA ANALYTICS SOLUTION BUY VS. BUILD Seamless Data Integration • Structured, semi and unstructured • Pre-built connectors • Connector plug-in API Powerful Analytics • Interactive spreadsheet UI • Built-in analytic functions • Macros and function plug-in API Business Infographics • Mash up anything, WYSIWYG • Infographics and dashboards • Visualization plug-in API
  • 21. WHAT DO I NEED? If I Build,
  • 22. 21 WHAT DO I NEED? IF I BUILD... Architecture Hardware Functionality Process
  • 23. 22 Linux JVM Monitoring Hadoop Analytics WHAT ARCHITECTURE DO I NEED? IF I BUILD... Software Master/Secondary Slaves Network Application Server Hardware
  • 24. 23 Data Loading — A software has to be developed to load data from multiple, various data sources. This system needs to deal with the distributed nature of Hadoop on the one side and the non-distributed nature of the data source. The system needs to deal with corrupted records and need to provide monitoring services. In your big data analytics solution, you will need to provide the following: (see notes section for descriptions) WHAT FUNCTIONALITY DO I NEED? IF I BUILD... Data Loading Dependency Management — There are complex dependencies that must be managed. For example, certain data sets have to be loaded before certain jobs in Hadoop can be run. Dependency Management Data synchronization — Data often needs to be pushed from Hadoop in to a data store like a database or in-memory system. Data Synchronization Monitoring API — Every aspect of a big data analytics solution needs to be monitored. Things that need to monitored include who has access to the system, job health, performance, and data throughput. Monitoring API Management UI — A management user interface is critical for ease of configuration and monitoring. Management UI Security Integration — For security purposes, it is important to be able to integrate with Kerberos and LDAP. Security Integration Data Parsing — Most data sources provide data in a certain format that needs to be parsed into the Hadoop system. For example, let’s consider parsing a log file into records. Some formats are complicated to parse like JSON where a record can be on many lines of text and not just one line per record. Data Parsing Data Analytics — In order for data to be properly analyzed, a big data analytics solution needs to support rapid iterations.Data Analytics Scheduling — All the items discussed above need to be orchestrated and scheduled. Scheduling needs to be easy to configure. In addition, the scheduling needs have monitoring services to notify administrators of jobs that fail. Scheduling Data Visualization — In order for an analyst to see the insights, data needs to be visualized. Integrating visualization is difficult because middleware needs to be built to deliver the data out of Hadoop and into the visualization layer. Data Visualization
  • 25. 24 It is very difficult to hire engineers that are experienced with Hadoop. Most already work for other big companies and/or are very expensive. Hadoop resources can cost $300K/ year or more. The first step is a design phase and includes requirements gathering (that will change later), technical design, release and resource planning. In the implementation phase all the “heavy lifting” development work has to be done. In this phase, as test development is tested, test engineers and a dedicated test environment is required. This often doubles hardware costs. When building a system frequent deployment into the production system occurs. This means the production system is frequently down or another staging environment is required. When all design, develop, test and deploy work is done then you can finally start analyzing the data. Most often you will find that people have limited Hadoop experience, and you will need to train them on the technology. This is a significant investment in time and money. In addition, practical Hadoop experience only comes from running a Hadoop environment. Hire Team Train Team on technology Define Develop Test Deploy Analyze In your big data analytics solution, you will need to provide the following: WHAT PROCESS DO I NEED? IF I BUILD...
  • 26. How do I Select THE RIGHT BIG DATA ANALYTICS SOLUTION FOR ME?
  • 27. 26 Ease of use Does the business need IT to Help? Can a business analyst use the tool to do analysis? Data Integration Does system support native connectors to unstructured and semi-structured data sources (e.g. email, log files, social, SaaS, machine data)? Does it support flexible partitioning of the data so that it is easy to work with large amounts of data? Does the solution support streaming of data so that users will have the most current data? Does the solution provide data quality functions so that the data can be quickly normalized and transformed? Analytics Does the solution provide an intuitive environment (e.g. spreadsheet) that business users can quickly use? Does the solution include pre-built analytic functions? Does the system provide a preview to validate analysis and show data lineage for auditing data flows? Do data models have to be defined before insights can be gained? Do analysts need to know what they want to do before they have had a chance to look at the data? Or can analysts look at the data, iterate, make the changes they need and analyze without involving IT? Visualizations Does the solution support complete freeform visualization? Or is it just a combination of reports and dashboards? Integration with existing IT infrastructure Does the solution support import from and export into other BI systems? Administration Does the system support flexible security integration with LDAP and ActiveDirectory? Extensibility Does the solution support open API’s for custom data connections and custom visualizations? Architecture Does the solution run natively on Hadoop? Is a separate cluster required (ideally no separate cluster should be required)? Are there memory constraints, or is the product limited by the availability of the memory within the nodes? The ideal solution should have no memory constraints as that limits the interactivity in analysis and leads to distortion of insights. Does the solution provide a job planner and optimizer to ensure the lowest number of MapReduce jobs is executed? Vendor requirements Is the system proven, does it have numerous major releases? How much is it in enterprise use, has it analyze substantial data, (terabytes, petabytes or exabytes)? HOW TO SELECT THE RIGHT BIG DATA ANALYTICS SOLUTION After working with hundreds of customers, we’ve found the following are the top things to look for in a Big Data Analytics Solution. DECISION CRITERIA FOR SELECTING A BIG DATA ANALYTICS SOLUTION
  • 29. 28 Identify Use Cases Create Hypothesis Find Insight Show Value to Management Iterate Sharpen Insight Measure & ROI GETTING SUCCESSFUL WITH BIG DATA ANALYTICS As you go through the steps of creating a successful project plan, IT and the Business must work hand-in-hand. Here is an overview of the Business and IT best practices. We’ll go into the Business best practices in more detail first and then the IT best practices. CREATING A SUCCESSFUL PROJECT PLAN Business Identify the Team Size Hardware Train Identify/ Ingest Data Deploy Disaster Recovery IT
  • 31. 30 BEST PRACTICES FOR THE BUSINESS IDENTIFY THE USE CASES AND THE TEAM Are you trying to: 1. Optimize the funnel? 2. Perform behavioral analytics? 3. Perform customer segmentation? 4. Predict support issues? 5. Analyze credit risk? 6. Perform market basket analysis and optimize pricing? 7. Automate support? 8. Detect and prevent security threats? 9. Detect and prevent fraud? Identify the Use Cases Who is the: 1. Budget owner? 2. Project manager? 3. Data owner? 4. Subject matter expert? 5. Business analyst and users? Identify the Team 1. Tackle low hanging fruit first for a quick win to show fast ROI 2. Choose the project that will provide the biggest potential value Data Analyst Best Practices
  • 32. 31 BEST PRACTICES FOR THE BUSINESS BUDGET SO YOU CAN SELL INTERNALLY To successfully sell internally, answer the following questions: What’s my total cost of ownership? (Contact marketing@datameer.com for more information on this analysis). Have I considered and budgeted for: • People? • Software Acquisition, Support? • Hardware (Server, network)? • Logistics (Hardware / people)? • Operations/Data Center cost (power, cooling, etc.)? It is critical to sell your project internally not just at the beginning of the project but continuously. Especially as Big Data Analytics is new and top of mind for many executives and management, selling internally will pay off in the short and long term. To sell internally, it will be important to: Identify the sponsor Build a business case Define your metrics for success Evangelize big data in business terms Identify and communicate the risk of not doing big data Show that the competition is already doing it Start small, show the results fast Communicate risks clearly Demonstrate the use case through pilots Involve the consumers of data, not just producers, early
  • 33. 32 • Big Data Analytics gets better and more useful the more you use it. So iterating has been critical to long term success for our customers. • Start small and build on your success • Don’t boil the ocean BEST PRACTICES FOR THE BUSINESS BIG DATA ANALYTICS IS AN ITERATIVE PROCESS Analyze Create your hypothesis Act on insight and measure ROI Visualize
  • 35. Do I have an I/O or CPU intensive workload? Do I really need to evaluate the Hadoop Distribution vendors? Have I considered power consumption? Have I considered cooling needs? Have I considered the weight of hardware? Does our data center support it? Start with a small cluster. You can always buy more later. Consider using a cloud provider like Amazon/Rackspace for fast provisioning Time to insight matters more than time per query! Hadoop does 3x data replication, no RAID needed Hadoop stores intermediate results – leave twice as much storage headroom as you want to analyze IDENTIFY THE HARDWARE YOU NEED BEST PRACTICES FOR IT Master/Secondary Slaves Network Application Server • Fault Tolerant • RAID • Dual Network Power, etc. • 4+ HDD • 1 CPU core per HDD • 2GB RAM/core • 1GB fully unblocked network switches • 10GB uplink for Top-of-Rack switches • Dual Network Power, etc. • 2 HDD’s • 16GB RAM 34 Questions to ask: Best Practices:
  • 36. 1. What data do I have available? 2. What’s missing that I’d like to use? 3. What data elements can I use to link this data with existing data? 4. Will the data be pushed or pulled into Datameer and Hadoop? 5. What is the performance expectation? 6. What type and level of security is needed? Are you looking for data masking or anonymization? 7. What is the data retention policy? 8. What is the data management strategy? Answering the following will help identify and aggregate the right data sources in particular: BEST PRACTICES FOR IT IDENTIFY AND AGGREGATE YOUR DATA SOURCES 35
  • 37. 36 1. Convert your discovery into scheduled reports and run as needed 2. Measure your improvements 3. Continuously monitor results 4. Share and collaborate Now that you have the insights, let’s automate the process and make creating these impactful insights repeatable. To do that, the following are important to incorporate into your plan. BEST PRACTICES FOR IT PUT YOUR PLAN IN ACTION & DEPLOY Measure your improvements Convert your discovery into scheduled reports Share and collaborate Continuously monitor results
  • 38. 37 1. Set up workshops instead of lectures 2. Educate and empower users on their own data and systems 3. Encourage users to think outside the box 4. Enable users to access external data sources 5. Inspire them to be creative with data discovery Enabling your organization, across the different functions and business stakeholders, can be challenging. The following things have helped with the customers we worked with: BEST PRACTICES FOR IT ENABLEMENT
  • 39. Land and expand. Start small and go after the low hanging fruit. Then build upon those successes to go broader. A successful Big Data Analytics solution makes it easier and faster to analyze broader sets of data. You need a solution that: CONCLUSION KEY TAKEAWAYS 1 2 Is faster because it has fewer steps in the process. Instead of hiring developers, ramping them up, defining, developing, testing, deploying and then analyzing, you can start with defining and analyzing. Has fewer technical requirements Instead of manually building capabilities for data loading, data parsing, data analytics, data visualization, scheduling, dependency management, data synchronization, monitoring API, management UI, and security integration, you can have it all in one platform. Buying the right solution also means you don’t have force fit a new technology into old architecture. 3 Prebuilt integration, analytics, dashboards Data integration – Quickly ingesting data from many different types of data sources is critical to the success of a big data project. Eliminating the IT intensive ETL and modeling processes is critical to allowing analysts to be able to meaningfully interact with data and to prevent long and expensive waits while data is prepared. To achieve this, a solution should have a large set of native connectors for the integration of all types of disparate data — structured, unstructured and semi-structured (e.g. databases, file systems, social media, mainframe, SaaS applications, XML, JSON). Analytics – Analysts need a tool to work with the data, turn it into a form that can be usable and then perform the analysis. The means having pre built transformations to clean up the data so that is usable as well as prebuilt analytic functions. These functions should include ones to support unstructured data and sentiment analysis. This tool should also provide a preview to validate analysis and show data lineage for auditing data flows. Visualization – Beyond static dashboards, analysts need Infographics where data visualizations have no built-in constraints. Users need to be able to drag and top any widget, graphic, text, dashboard or info graphic element as needed or desired. 38
  • 40. 39 STEFAN GROSCHUPF CEO FRANK HENZE VP, Product Management PETER VOSS CTO EDUARDO ROSAS VP, Services Stefan Groschupf is the co-founder and CEO of Datameer, the only end-to-end data integration, analysis and visualization platform for big data analytics on Hadoop. Frank Henze is Vice President of Product Management at Datameer with over a decade of experience in building enterprise software systems. Peter Voss is CTO at Datameer with extensive experience in software engineering and architecture of large-scale data processing. Eduardo Rosas is VP of Services and heads the implementation of Big Data Analytics projects that triple customer conversion rates, lead to over $20M in additional revenue, and cut IT costs of big data analytics projects in half. KEN KRUGLER President, Scale Unlimited RON BODKIN CEO, Think Big Analytics JOE CASERTA CEO, Caserta Concepts PHIL SHELLEY Former CTO, Sears Consulting and training for big data processing and web mining problems, using Hadoop, Cascading, Cassandra and Solr. Apache Software Foundation member, committer for Tika open source content extraction toolkit. Ron founded Think Big to help companies realize measurable value from Big Data. Previously, Ron was VP Engineering at Quantcast where he led the data science and engineer teams that pioneered the use of Hadoop and NoSQL for batch and real-time decision making. Joe Caserta is a veteran solution provider, educator, and President of Caserta Concepts, a specialized data warehousing, business intelligence and big data analytics consulting firm. Dr. Shelley advises companies on designing, delivering and operating Hadoop-based solutions for Analytics, Mainframe Migration and massive-scale processing. He was CTO at Sears Holdings Corporation (SHC) and lead IT Operations focusing on the modernization of IT across the company. You’re not alone! Combining decades of experience in Hadoop, data management and business intelligence, there are people who have been working with customers around the world, across industries such as Communications, Financial Services, Retail, Internet Security, Consumer Packaged Goods, Transportation, Logistics, and deep into use cases such as fraud detection, behavioral analytics, customer segmentation, pricing optimization and credit risk analysis. We are happy to help, so please reach out to the following experts with any questions or thoughts you might have. Also please send any questions, comments or thoughts on this guide to khsu@datameer.com RESOURCES EXPERTS IN BIG DATA ANALYTICS