SlideShare a Scribd company logo
1 of 23
1
Snowplow and
Cascalog
METAIL - YOUR ONLINE FITTING ROOM
Presentation by Rob Boland, Lead Data Architect
2
Introduction
• Introduction to Metail – who we are, why we use Snowplow
• How the Lambda Architecture has influenced our Data
Architecture
• Where Cascalog fits in at Metail and why it works well with
Snowplow
• Example of where we’ve used Cascalog and how it works
• Looker forward to the future
3
Every body is unique and
should be celebrated
4
YOUR ONLINE FITTING ROOM
5
• Sign up with just a few clicks
• See how the clothes look on you
• Build layered outfits
• Get size recommendation
http://trymetail.com/collections/metail
6
1. Customer shape & size data can now aid brand’s buying & selling decisions
2. Body shape & outfitting data -> crowd sourced outfit recommendations
Product portfolio: Data services
UNDERSTANDING SHAPE PROFILE OF CUSTOMERS HOW SHAPE VARIES BY SIZE
Do we need to create new collections
to cater for clusters of different shapes?
Do we need to change the fit profile by
size to accommodate different shapes?
7
KPI Analysis –
Can we prove it actually works?
Metric Definition
Return on Investment [(VPVuplift * All Visits ) - Investment] / Investment
Net sales revenue Value of retained items in bin
Value per visitor Net Sales Revenue / Visitors
Visits (sessions) Set of activities with <= 30 minutes between consecutive events
User Conversion Orders / Visitors
Adoption Rate Number of user’s who use Metail / Number of user’s shown Metail
Average Order Value Median value of all orders tracked in the time period
Return Rate Number of items returned / Number of Items purchased
Average Retained Order
Value
Median value of all orders tracked in the time period after removing
returned items
AB Set up: 50/50 split test
Managed by: Metail through their AB test platform
8
KPI Analysis –
Can we prove Metail impact?
Data Collection
We need to know visitor counts, order values, which test group the
user was in, whether they actually used Metail or not, time on site,
what garments they wore, etc. etc.
9
Enter Snowplow
10
What Metail looks like (for now…)
11
Data Collection! Now what?
Read the Big Data book
(Still MEAP after 3 years!)
12
Lambda Architecture
13
Cascalog to produce Batch Views
Turn the Snowplow event stream into a normalised schema
Body Shape
Orders
Items Ordered
Returns
Browsers
(visitors)
Sessions
Garment Details
AB Events
Snowplow
Events
14
Cascalog:
Snowplow ETL Runner Output -> Batch Views
Cascalog is designed to process Big Data on top of Hadoop. It is a
replacement for tools like Pig, Hive, and Cascading which operates at a
significantly higher level of abstraction than those tools [1]
Write Clojure code to create our data processing jobs
• The code you write has be MapReduce aware, but the low level
implementation details are taken care of
• What we’re really doing is adding another ETL Step to the Snowplow flow
[1] http://cascalog.org/
Cascalog is written in Clojure (JCascalog in Java, or Scalding in Scala)
It’s easy to run on Amazon EMR – fits in with the Snowplow flow nicely
15
Cascalog – Worth the effort?
Couldn’t you achieve the same output working with the
events table alone?
…kind of
But there are two key benefits:
1. Breaking the data into a manageable schema means you can
directly access the data you care about
2. Complex logic and aggregation is easier to achieve
Real example:
• KPI Data Aggregation
16
Cascalog – KPI Data Aggregation
Value per visitor Net Sales Revenue / Visitors
User Conversion Orders / Visitors
Adoption Rate Number of user’s who use Metail / Number of user’s shown Metail
How do we calculate KPIs from our Snowplow data?
In both the Active and Control groups, we need:
• Visitor Count
• Engaged Visitor Count
• Order Count
• Order Value
17
Cascalog – KPI Data Aggregation
Visitors
Count
• Snowplow tracks visitors – our code just has to look up visitors who
are in the test we’re measuring
Engaged Count
• Fire a structured event to Snowplow each time an ‘engagement’ event
occurs. For each visitor in the test, our code has to find whether or
not they engaged with Metail
Orders
We encode all of the relevant order information on the page in JSON and
fire an unstructured event with the details
Order Count
• Our code needs to find all of the order events in the time period
Order Value
• Our code needs to read the order value and sum it together
18
Cascalog – KPI Data Aggregation
We can do better!
What we really want is a user level summary of the data
domain_id engaged order_value order_id ab_group
0014822757d9a81f null 175.89 89281949 out
0015ca5144f0fae7 null null null out
0015dd8901887010 null 310.22 25394849 out
0015e633aa2c158d null null null in
00204e1bcc87b734 null null null out
0042472794f2b57a null 191.98 89392136 in
004389f95e620dd0 null null null out
0044867c3d7b1cf5 null null null out
00456d1e9300296e null null null out
0045dc05b4262ed2 null null null in
0045f74358a842c1 TRUE null null in
00462b685f4188ad null null null out
0048fccbe230dc57 null null null out
0049a5d24498051d TRUE 101.96 27529849 in
19
Cascalog – Implementation
1) Read in the Snowplow events data in HDFS
2) Remove events we don’t care about
20
Cascalog – Implementation
3) Take those events, pull out the bits we care about and join them together
21
What do we do with the Batch Views?
Take the output and crunch it in R (or Incanter)
A lot of the subsequent analysis we run on our batch views requires
statistical packages, so we run our advanced analysis in R.
Thankfully, having the batch views ready has led to far fewer of these:
22
A Looker Ahead
Not everyone can write Cascalog and R.
Looker will open our batch views and Snowplow events to
our Business Analysts
23
www.metail.com
Contact information
ROB BOLAND
LEAD DATA ARCHITECT
rob@metail.com
Skype: rpboland

More Related Content

What's hot

Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessGiuseppe Gaviani
 
How to evolve your analytics stack with your business using Snowplow
How to evolve your analytics stack with your business using SnowplowHow to evolve your analytics stack with your business using Snowplow
How to evolve your analytics stack with your business using SnowplowGiuseppe Gaviani
 
Modelling event data in look ml
Modelling event data in look mlModelling event data in look ml
Modelling event data in look mlyalisassoon
 
Big Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowAlexander Dean
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSGiuseppe Gaviani
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016yalisassoon
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowGiuseppe Gaviani
 
Introducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from SnowplowIntroducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from SnowplowGiuseppe Gaviani
 
Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...yalisassoon
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow Analytics
 
How Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowGiuseppe Gaviani
 
Flows in the Service Console, Gotta Go with the Flow! by Duncan Stewart
Flows in the Service Console, Gotta Go with the Flow! by Duncan StewartFlows in the Service Console, Gotta Go with the Flow! by Duncan Stewart
Flows in the Service Console, Gotta Go with the Flow! by Duncan StewartSalesforce Admins
 
Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016yalisassoon
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipelineyalisassoon
 
Snowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcaseSnowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcaseyalisassoon
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...yalisassoon
 
Snowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessyalisassoon
 
Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logAlexander Dean
 
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...yalisassoon
 

What's hot (20)

Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your business
 
How to evolve your analytics stack with your business using Snowplow
How to evolve your analytics stack with your business using SnowplowHow to evolve your analytics stack with your business using Snowplow
How to evolve your analytics stack with your business using Snowplow
 
Modelling event data in look ml
Modelling event data in look mlModelling event data in look ml
Modelling event data in look ml
 
Big Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing Snowplow
 
Snowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & Snowplow
 
Introducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from SnowplowIntroducing Sauna - Decisioning and response platform from Snowplow
Introducing Sauna - Decisioning and response platform from Snowplow
 
Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...
 
Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3
 
How Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with Snowplow
 
Flows in the Service Console, Gotta Go with the Flow! by Duncan Stewart
Flows in the Service Console, Gotta Go with the Flow! by Duncan StewartFlows in the Service Console, Gotta Go with the Flow! by Duncan Stewart
Flows in the Service Console, Gotta Go with the Flow! by Duncan Stewart
 
Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016
 
Snowplow the evolving data pipeline
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipeline
 
Snowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcaseSnowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcase
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
 
Snowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your business
 
Unified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified logUnified Log London (May 2015) - Why your company needs a unified log
Unified Log London (May 2015) - Why your company needs a unified log
 
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
 

Similar to Snowplow, Metail and Cascalog

Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeAtScale
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Niko Neugebauer
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Microsoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ FabricMicrosoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ FabricJuan Fabian
 
Pipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and GraphPipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and Graphconfluent
 
Predictive Conversion Modeling - Lifting Web Analytics to the next level
Predictive Conversion Modeling - Lifting Web Analytics to the next levelPredictive Conversion Modeling - Lifting Web Analytics to the next level
Predictive Conversion Modeling - Lifting Web Analytics to the next levelPetri Mertanen
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyNeo4j
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1guest9529cb
 
Understanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsUnderstanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsPrathamesh Kulkarni
 
Transform your Entire Customer Life Cycle, at Enterprise Scale by Marc Aubin ...
Transform your Entire Customer Life Cycle, at Enterprise Scale by Marc Aubin ...Transform your Entire Customer Life Cycle, at Enterprise Scale by Marc Aubin ...
Transform your Entire Customer Life Cycle, at Enterprise Scale by Marc Aubin ...Salesforce Admins
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Neo4j
 
Connecting the odds in the brave world! Sitecore Commerce Connect
Connecting the odds in the brave world!Sitecore Commerce ConnectConnecting the odds in the brave world!Sitecore Commerce Connect
Connecting the odds in the brave world! Sitecore Commerce Connectsuneco_nl
 
How to drive real business value from your virtual Supply Chain twin?
How to drive real business value from your virtual Supply Chain twin?How to drive real business value from your virtual Supply Chain twin?
How to drive real business value from your virtual Supply Chain twin?Bluecrux
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
 
1KEY Multidimensional
1KEY Multidimensional1KEY Multidimensional
1KEY MultidimensionalDhiren Gala
 
Big Data & Technology at Billabong
Big Data & Technology at BillabongBig Data & Technology at Billabong
Big Data & Technology at BillabongMark Lacey
 

Similar to Snowplow, Metail and Cascalog (20)

Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
How to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on SnowflakeHow to Realize an Additional 270% ROI on Snowflake
How to Realize an Additional 270% ROI on Snowflake
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016Columnstore improvements in SQL Server 2016
Columnstore improvements in SQL Server 2016
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Microsoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ FabricMicrosoft Dynamics 365 IA - Copilot/ Fabric
Microsoft Dynamics 365 IA - Copilot/ Fabric
 
Pipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and GraphPipelining the Heroes with Kafka and Graph
Pipelining the Heroes with Kafka and Graph
 
Predictive Conversion Modeling - Lifting Web Analytics to the next level
Predictive Conversion Modeling - Lifting Web Analytics to the next levelPredictive Conversion Modeling - Lifting Web Analytics to the next level
Predictive Conversion Modeling - Lifting Web Analytics to the next level
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Project report aditi paul1
Project report aditi paul1Project report aditi paul1
Project report aditi paul1
 
Understanding Web Analytics and Google Analytics
Understanding Web Analytics and Google AnalyticsUnderstanding Web Analytics and Google Analytics
Understanding Web Analytics and Google Analytics
 
Transform your Entire Customer Life Cycle, at Enterprise Scale by Marc Aubin ...
Transform your Entire Customer Life Cycle, at Enterprise Scale by Marc Aubin ...Transform your Entire Customer Life Cycle, at Enterprise Scale by Marc Aubin ...
Transform your Entire Customer Life Cycle, at Enterprise Scale by Marc Aubin ...
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Connecting the odds in the brave world! Sitecore Commerce Connect
Connecting the odds in the brave world!Sitecore Commerce ConnectConnecting the odds in the brave world!Sitecore Commerce Connect
Connecting the odds in the brave world! Sitecore Commerce Connect
 
How to drive real business value from your virtual Supply Chain twin?
How to drive real business value from your virtual Supply Chain twin?How to drive real business value from your virtual Supply Chain twin?
How to drive real business value from your virtual Supply Chain twin?
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
1KEY Multidimensional
1KEY Multidimensional1KEY Multidimensional
1KEY Multidimensional
 
Big Data & Technology at Billabong
Big Data & Technology at BillabongBig Data & Technology at Billabong
Big Data & Technology at Billabong
 
Benchmarking no Unit4
Benchmarking no Unit4Benchmarking no Unit4
Benchmarking no Unit4
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

Snowplow, Metail and Cascalog

  • 1. 1 Snowplow and Cascalog METAIL - YOUR ONLINE FITTING ROOM Presentation by Rob Boland, Lead Data Architect
  • 2. 2 Introduction • Introduction to Metail – who we are, why we use Snowplow • How the Lambda Architecture has influenced our Data Architecture • Where Cascalog fits in at Metail and why it works well with Snowplow • Example of where we’ve used Cascalog and how it works • Looker forward to the future
  • 3. 3 Every body is unique and should be celebrated
  • 5. 5 • Sign up with just a few clicks • See how the clothes look on you • Build layered outfits • Get size recommendation http://trymetail.com/collections/metail
  • 6. 6 1. Customer shape & size data can now aid brand’s buying & selling decisions 2. Body shape & outfitting data -> crowd sourced outfit recommendations Product portfolio: Data services UNDERSTANDING SHAPE PROFILE OF CUSTOMERS HOW SHAPE VARIES BY SIZE Do we need to create new collections to cater for clusters of different shapes? Do we need to change the fit profile by size to accommodate different shapes?
  • 7. 7 KPI Analysis – Can we prove it actually works? Metric Definition Return on Investment [(VPVuplift * All Visits ) - Investment] / Investment Net sales revenue Value of retained items in bin Value per visitor Net Sales Revenue / Visitors Visits (sessions) Set of activities with <= 30 minutes between consecutive events User Conversion Orders / Visitors Adoption Rate Number of user’s who use Metail / Number of user’s shown Metail Average Order Value Median value of all orders tracked in the time period Return Rate Number of items returned / Number of Items purchased Average Retained Order Value Median value of all orders tracked in the time period after removing returned items AB Set up: 50/50 split test Managed by: Metail through their AB test platform
  • 8. 8 KPI Analysis – Can we prove Metail impact? Data Collection We need to know visitor counts, order values, which test group the user was in, whether they actually used Metail or not, time on site, what garments they wore, etc. etc.
  • 10. 10 What Metail looks like (for now…)
  • 11. 11 Data Collection! Now what? Read the Big Data book (Still MEAP after 3 years!)
  • 13. 13 Cascalog to produce Batch Views Turn the Snowplow event stream into a normalised schema Body Shape Orders Items Ordered Returns Browsers (visitors) Sessions Garment Details AB Events Snowplow Events
  • 14. 14 Cascalog: Snowplow ETL Runner Output -> Batch Views Cascalog is designed to process Big Data on top of Hadoop. It is a replacement for tools like Pig, Hive, and Cascading which operates at a significantly higher level of abstraction than those tools [1] Write Clojure code to create our data processing jobs • The code you write has be MapReduce aware, but the low level implementation details are taken care of • What we’re really doing is adding another ETL Step to the Snowplow flow [1] http://cascalog.org/ Cascalog is written in Clojure (JCascalog in Java, or Scalding in Scala) It’s easy to run on Amazon EMR – fits in with the Snowplow flow nicely
  • 15. 15 Cascalog – Worth the effort? Couldn’t you achieve the same output working with the events table alone? …kind of But there are two key benefits: 1. Breaking the data into a manageable schema means you can directly access the data you care about 2. Complex logic and aggregation is easier to achieve Real example: • KPI Data Aggregation
  • 16. 16 Cascalog – KPI Data Aggregation Value per visitor Net Sales Revenue / Visitors User Conversion Orders / Visitors Adoption Rate Number of user’s who use Metail / Number of user’s shown Metail How do we calculate KPIs from our Snowplow data? In both the Active and Control groups, we need: • Visitor Count • Engaged Visitor Count • Order Count • Order Value
  • 17. 17 Cascalog – KPI Data Aggregation Visitors Count • Snowplow tracks visitors – our code just has to look up visitors who are in the test we’re measuring Engaged Count • Fire a structured event to Snowplow each time an ‘engagement’ event occurs. For each visitor in the test, our code has to find whether or not they engaged with Metail Orders We encode all of the relevant order information on the page in JSON and fire an unstructured event with the details Order Count • Our code needs to find all of the order events in the time period Order Value • Our code needs to read the order value and sum it together
  • 18. 18 Cascalog – KPI Data Aggregation We can do better! What we really want is a user level summary of the data domain_id engaged order_value order_id ab_group 0014822757d9a81f null 175.89 89281949 out 0015ca5144f0fae7 null null null out 0015dd8901887010 null 310.22 25394849 out 0015e633aa2c158d null null null in 00204e1bcc87b734 null null null out 0042472794f2b57a null 191.98 89392136 in 004389f95e620dd0 null null null out 0044867c3d7b1cf5 null null null out 00456d1e9300296e null null null out 0045dc05b4262ed2 null null null in 0045f74358a842c1 TRUE null null in 00462b685f4188ad null null null out 0048fccbe230dc57 null null null out 0049a5d24498051d TRUE 101.96 27529849 in
  • 19. 19 Cascalog – Implementation 1) Read in the Snowplow events data in HDFS 2) Remove events we don’t care about
  • 20. 20 Cascalog – Implementation 3) Take those events, pull out the bits we care about and join them together
  • 21. 21 What do we do with the Batch Views? Take the output and crunch it in R (or Incanter) A lot of the subsequent analysis we run on our batch views requires statistical packages, so we run our advanced analysis in R. Thankfully, having the batch views ready has led to far fewer of these:
  • 22. 22 A Looker Ahead Not everyone can write Cascalog and R. Looker will open our batch views and Snowplow events to our Business Analysts
  • 23. 23 www.metail.com Contact information ROB BOLAND LEAD DATA ARCHITECT rob@metail.com Skype: rpboland

Editor's Notes

  1. Fashion technology start-up company Focused on delivering best UX for browsing and buying clothes online How? – by recognising every body is unique and should be celebrated! When looking at clothes online, why are we restricted to only seeing how they look on models or mannequins? Why not on our own bodies? That is the question we are solving through 2 core technologies: Body visualisation – having a quick and easy way to create your body model online - your MeModel Garment fit – low cost and quick method for digitising clothes The results? Well you can see for yourself from this slide, which shows a collection of MeModels we have created, wearing different clothes
  2. I’m not going to spend too much time on this slide, but I wanted to give an overview of the kind of data services we provide for our retailers and we put together from the data we collect
  3. GA just doesn’t give us the level of detail we require. It has it’s uses, and provides great overviews and visualisations, but drilling into the detail of what a user actually did gets a bit clunky. Funnel analysis never quite cut it for us, especially when it comes to measuring KPIs and billing where it’s really important it’s accurate and correct
  4. Key points to note: we are adding two trackers here, one that sits on the retailers site and one that sits on our widget. Because we have the tracker on the retailers pages, we get a lot more data than a startup of our size might expect We track everything, send a _lot_ of structured events (fell out of GA), and also use unstructured events where we’ve needed to pass more data We actually started our Snowplow collection before we really knew what to do with it. No harm getting the tracker on early
  5. MEAP for a mere three years – hopefully Unified Log Processing comes more quickly…
  6. Computing arbitrary functions on arbitrary data Batch layer – Stores the master dataset and computes arbitrary views Serving layer - The serving layer indexes the batch view and loads it up so it can be efficiently queried to get particular values out of the view. The serving layer is a specialized distributed database that loads in a batch views, makes them queryable, and continuously swaps in new versions of a batch view as they're computed by the batch layer. Speed layer - Takes the data and updates it based on what it knows, discards data as it’s no longer needed Robust and fault tolerant Scalable General Extensible Allows ad hoc queries Minimal maintenance Debuggable
  7. Entities we care about
  8. Batch computations are written like single-threaded programs, yet automatically parallelize across a cluster of machines. This implicit parallelization makes batch layer computations scale to datasets of any size. It's easy to write robust, highly scalable computations on the batch layer. Scale
  9. Remember our KPI slide – I’ve picked out a couple of these and I’m going to talk about how we use Snowplow to capture this data
  10. All of these things would be fairly easy to pull out of the processed Snowplow data – even if it’s large. Redshift is good at running these kind of queries. Combining the numbers returned is not difficult Problem if you present this back to the retailer or your users – there are always follow up questions and it’s difficult to drill down on this kind of summary data What kind of items do the users who engaged try on vs what they purchased? Can you tell me which users What days were there the most orders. Can you provide the order_ids so we could check the values our end?
  11. This is better because we now have the snowplow domain_id. It’s a summary view showing us, for any specific user in the test, which group they were in, did they click on the Metail button, did they make an order and if so how much? Tying everything back to the user is a great advantage, because any subsequent analysis is much easier to carry out. We join back to the Snowplow events on domain_id. For users who engaged: what did they try on? This data has just run in a batch so is ready and waiting for us to start analysis on – doesn’t need recomputed over again It’s also easy to calculate the KPIs I mentioned and because we have everything on a per user level, we can perform statistical bootstrapping to look at the distributions and work out errors bars on the results
  12. I know many of you will never have seen Clojure before and I don’t intend to spend time going through every line, but I wanted to show you that what we’re doing is conceptually very simple A few lines of code and we’ve cleared a huge amount of data we don’t need: Chuck invalid ip addresses Anything that’s not a Struct or an Unstruct event And we’ve started to transform it. Page urls become retailers
  13. Cascalog takes care of all of the nitty gritty – and running it on Amazon EMR means we can power it up as we’d like because you’re leveraging mapreduce. MapReduce – doesn’t matter how big your Snowplow logs are, you can split the data arbitrarily and run Cascalog over it. Every row can
  14. At the moment