Jump start into 2013 by exploring how Big Data can transform your business. Listen to Infochimps Director of Product, Tim Gasper, cover the leading use cases for 2013, sharing where the data comes from, how the systems are architected and most importantly, how they drive business insights for data-driven decisions.
9. #5usecases
more data than ever
CRM/customer support before
POS/purchases
ERP/accounting
email/documents/collab.
BI & data warehouse
system & network logs many terabytes of data,
web logs/clickstream sometimes many petabytes
google
analytics/omniture
facebook/twitter
?
yelp/foursquare/google
experian/epsilon/acxiom
mobile devices
sensors
product reviews
google search results
+ more
10. #5usecases
BIG DATA
• volume • scalable
• velocity • intelligent
• variety • agnostic
• variability • holistic
16. #5usecases
customer risk analysis
comprehensive data picture
• build comprehensive data picture of customer-side
risk
• publish a consolidated set of attributes for analysis
• add additional context, both internal and external
parse and aggregate data from different sources
• credit and debit cards, product payments, deposits
and savings
• banking activity, browsing behavior, call logs, e-mails
and chats
merge data into a single view
• a “fuzzy join” among data sources
• structure and normalize attributes
• sentiment analysis, pattern recognition
18. #5usecases
surveillance & fraud detection
activity records in a central repository
• centralized logging across all execution platforms
• structured and raw log data from multiple applications
pattern recognition to detect anomalies/harmful behavior
• feature set and timeline vector are very dynamic
• “schema on read” provides flexibility for analysis
data is primarily served and processed in HDFS with
MapReduce
• data filtering and projection in Pig and Hive
• statistical modeling of data sets in R or SAS
22. #5usecases
brand & sentiment analysis
the internet generates a lot of chatter about brands
• understanding what’s said is key to protecting brand
value
• facebook & twitter generate a flood of data for large
brands
capturing and processing direct feedback
• better engagement and alerting via sentiment analysis
• integration with other customer service systems
hadoop handles the diverse data types and processing
• sources of data changing and semantics continuously
evolving
• sophistication of algorithms is iteratively improving
23. #5usecases
large media conglomerate
search &
Social
Media
ingest data application
News, Blogs, etc.
Traditional
Media
real-time sentiment, trend analysis
influence, gender,
topic extraction, etc.
26. #5usecases
customer churn analysis
understanding customer behavior and preferences
• rapidly test and build behavioral model of customer
• combine disparate data sources (transactional, social,
etc.)
structure and analyze with Hadoop
• traversing usage and social graphs
• pattern identification and recognition to find indicators
feature extraction to find root causes
• defining attributes and modeling statistical
significance
• combinations and sequence of attributes + actions
factor in
27. #5usecases
customer loyalty
comparison shopping is making retail hyper-competitive
• discount programs, e-mail correspondence entice
shoppers
• brand loyalty means attention to detail and service
customer lifecycle is more than purchases
• browsing and online data used to capture customer
attention
• loyalty programs bridge the gap between purchases
reach into online channels
• online engagement is personalized just as in store
• connecting online and in store shows customer
awareness
28. #5usecases
customer segmentation
Demographics, customer insight
Geography, ingest data reports
Web Data, etc.
Point Of Sale
Purchase Data
shopping pattern
recognition
30. #5usecases
targeted offers
the checkout lane is everywhere
• cookies track users through ad impressions
• purchasing behavior is time sensitive
logs collected online and offline
• data is ingested incrementally
• process happens at a variety of time scales
data logged into HBase and primary store
• some events naturally associate, others require
deeper analysis
• insights implemented via application logic
31. #5usecases
recommendations & forecasting
collect and serve personalization information
• wide variety of constantly changing data sources
• data guaranteed to be messy
data ingestion includes collection of raw data
• filtering and fixing of poorly formatted data
• normalization and matching across data sources
analysis looks for reliable attributes and groupings
• interpretation (e.g. gender by name)
• aggregation across likely matching identifiers
• identify possible predicted attributes or preferences
32. #5usecases
major apparel brand
targeted discounts
pre-defined
Clickstream
web content
Data from Online ingest data and deals
Storefront
behavioral
cluster analysis
40. #5usecases
big data exploration & visualization
41. #5usecases
popular online deal site
business command center
Retail Site
ingest data BI dashboarding
Web Logs
SQL analysis
with Hive & Hue
42. #5usecases
learn more >>
sales@infochimps.com
1-855-328-2386
Request a Demo:
http://infochimps.com/demo
Editor's Notes
we are a big data cloud services provider for the enterprise. we bundle together all the analytics infrastructure you need, like Hadoop, real-time analytics, and powerful databases, and provide the hosting, support, and expertise – so that you can focus on analytics and driving those business use cases and apps – not on wrangling with the complex systems
I represent…a business person at an enterprisea technical person at an enterprisea consultanta vendorother
I represent…a business person at an enterprisea technical person at an enterprisea consultanta vendorother
I represent…a business person at an enterprisea technical person at an enterprisea consultanta vendorother
and 94% in the top 10
So let’s dig into it. Big data is a pretty easy idea to explain: we produce data, all the time, constantly, and we produce a lot of it. Data centers now take up 1.3% of global energy usage – as much as the entire continent of Australia. So we have some similarly big challenges and even bigger opportunities.On the left on this slide I’ve listed just a few of the kinds of data sources that might be available to an agency, should they choose to ingest them. Everything from their own clients’ customer databases, to streams of tweets from Twitter, to Google search results and even forum posts, can be ingested in the pursuit of building something that generates insights for their clients.
Best explained by describing other use-cases like the GAP. Copying Flip and Tim, so they can benefit from the use-case....When we stood up a Horton-works cluster at PARC for the GAP, we architected a system whereby we could combine real-time (Esper) with batch (Hortonworks) to essentially make GAP.com become both "interactive and intelligent".This was done by analyzing click-stream log data in real-time to determine your behavior and based on what you were doing at that very instant, we served up personalized content to each individual user.....influencing them in real-time. So based on your current activity (you interacted with the website), we acted to customize your experience, intelligently. Where Hadoop came in was to build the "population-based behavioral" clusters, which allowed us to pre-define which content to serve up for you if and when you followed a certain real-time sequence.For example, click-stream analysis in Hadoop determined that when a large, statistically significant group did the following:HomepageJeans sectionSkinny jeansLong-sleeve shirtsThey were 90% likely to buy both jeans and shirts together.Whereas, if you did the following:HomepageLong-sleeve shirtsJeansSkinny jeansYou only bought the shirt! UNLESS there was at least a 20% discount associated with it.Two different clusters determined through complex Hadoop analysis over a long period of time.So....when you surf the web in real-time on the site, you see the following interactive behavior happen:Cluster 1: Homepage->Jeans->Skinny -> Recommendation to go to Long-sleeve shirts -> Long-sleeve shirts -> Purchase with NO DiscountCluster 2: Homepage->Long-sleeve shirts -> Recommendation to go to Jeans -> Skinny jeans -> 20% discount offered in real-time -> PurchaseThis is an interactive and intelligent web and e-commerce application which is 100% data-driven.
I invite you to let us know what your use case is, and we can help you evaluate which tools and architecture is appropriate to solve it. Now we are open to questions!