Snowplow, Metail and Cascalog

1
Snowplow and
Cascalog
METAIL - YOUR ONLINE FITTING ROOM
Presentation by Rob Boland, Lead Data Architect

2
Introduction
• Introduction to Metail – who we are, why we use Snowplow
• How the Lambda Architecture has influenced our Data
Architecture
• Where Cascalog fits in at Metail and why it works well with
Snowplow
• Example of where we’ve used Cascalog and how it works
• Looker forward to the future

3
Every body is unique and
should be celebrated

5
• Sign up with just a few clicks
• See how the clothes look on you
• Build layered outfits
• Get size recommendation
http://trymetail.com/collections/metail

6
1. Customer shape & size data can now aid brand’s buying & selling decisions
2. Body shape & outfitting data -> crowd sourced outfit recommendations
Product portfolio: Data services
UNDERSTANDING SHAPE PROFILE OF CUSTOMERS HOW SHAPE VARIES BY SIZE
Do we need to create new collections
to cater for clusters of different shapes?
Do we need to change the fit profile by
size to accommodate different shapes?

7
KPI Analysis –
Can we prove it actually works?
Metric Definition
Return on Investment [(VPVuplift * All Visits ) - Investment] / Investment
Net sales revenue Value of retained items in bin
Value per visitor Net Sales Revenue / Visitors
Visits (sessions) Set of activities with <= 30 minutes between consecutive events
User Conversion Orders / Visitors
Adoption Rate Number of user’s who use Metail / Number of user’s shown Metail
Average Order Value Median value of all orders tracked in the time period
Return Rate Number of items returned / Number of Items purchased
Average Retained Order
Value
Median value of all orders tracked in the time period after removing
returned items
AB Set up: 50/50 split test
Managed by: Metail through their AB test platform

8
KPI Analysis –
Can we prove Metail impact?
Data Collection
We need to know visitor counts, order values, which test group the
user was in, whether they actually used Metail or not, time on site,
what garments they wore, etc. etc.

10
What Metail looks like (for now…)

11
Data Collection! Now what?
Read the Big Data book
(Still MEAP after 3 years!)

13
Cascalog to produce Batch Views
Turn the Snowplow event stream into a normalised schema
Body Shape
Orders
Items Ordered
Returns
Browsers
(visitors)
Sessions
Garment Details
AB Events
Snowplow
Events

14
Cascalog:
Snowplow ETL Runner Output -> Batch Views
Cascalog is designed to process Big Data on top of Hadoop. It is a
replacement for tools like Pig, Hive, and Cascading which operates at a
significantly higher level of abstraction than those tools [1]
Write Clojure code to create our data processing jobs
• The code you write has be MapReduce aware, but the low level
implementation details are taken care of
• What we’re really doing is adding another ETL Step to the Snowplow flow
[1] http://cascalog.org/
Cascalog is written in Clojure (JCascalog in Java, or Scalding in Scala)
It’s easy to run on Amazon EMR – fits in with the Snowplow flow nicely

15
Cascalog – Worth the effort?
Couldn’t you achieve the same output working with the
events table alone?
…kind of
But there are two key benefits:
1. Breaking the data into a manageable schema means you can
directly access the data you care about
2. Complex logic and aggregation is easier to achieve
Real example:
• KPI Data Aggregation

16
Cascalog – KPI Data Aggregation
Value per visitor Net Sales Revenue / Visitors
User Conversion Orders / Visitors
Adoption Rate Number of user’s who use Metail / Number of user’s shown Metail
How do we calculate KPIs from our Snowplow data?
In both the Active and Control groups, we need:
• Visitor Count
• Engaged Visitor Count
• Order Count
• Order Value

17
Visitors
Count
• Snowplow tracks visitors – our code just has to look up visitors who
are in the test we’re measuring
Engaged Count
• Fire a structured event to Snowplow each time an ‘engagement’ event
occurs. For each visitor in the test, our code has to find whether or
not they engaged with Metail
Orders
We encode all of the relevant order information on the page in JSON and
fire an unstructured event with the details
Order Count
• Our code needs to find all of the order events in the time period
Order Value
• Our code needs to read the order value and sum it together

18
We can do better!
What we really want is a user level summary of the data
domain_id engaged order_value order_id ab_group
0014822757d9a81f null 175.89 89281949 out
0015ca5144f0fae7 null null null out
0015dd8901887010 null 310.22 25394849 out
0015e633aa2c158d null null null in
00204e1bcc87b734 null null null out
0042472794f2b57a null 191.98 89392136 in
004389f95e620dd0 null null null out
0044867c3d7b1cf5 null null null out
00456d1e9300296e null null null out
0045dc05b4262ed2 null null null in
0045f74358a842c1 TRUE null null in
00462b685f4188ad null null null out
0048fccbe230dc57 null null null out
0049a5d24498051d TRUE 101.96 27529849 in

19
Cascalog – Implementation
1) Read in the Snowplow events data in HDFS
2) Remove events we don’t care about

20
Cascalog – Implementation
3) Take those events, pull out the bits we care about and join them together

21
What do we do with the Batch Views?
Take the output and crunch it in R (or Incanter)
A lot of the subsequent analysis we run on our batch views requires
statistical packages, so we run our advanced analysis in R.
Thankfully, having the batch views ready has led to far fewer of these:

22
A Looker Ahead
Not everyone can write Cascalog and R.
Looker will open our batch views and Snowplow events to
our Business Analysts

23
www.metail.com
Contact information
ROB BOLAND
LEAD DATA ARCHITECT
rob@metail.com
Skype: rpboland

Snowplow, Metail and Cascalog

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Snowplow, Metail and Cascalog

Similar to Snowplow, Metail and Cascalog (20)

Recently uploaded

Recently uploaded (20)

Snowplow, Metail and Cascalog

Editor's Notes