Spotify engineers (Kinshuk Mishra and Noel Cody) share their experiences about building personalized ad experiences for users through iterative engineering and product development. The slide explains their process of continuous problem discovery, hypothesis generation, product development and experimentation. They deep dive into the specific ad personalization problems Spotify is solving and explain their data infrastructure technology stack in detail. They also explain how they've experimented various product hypothesis and iteratively evolved their infrastructure to keep up with the product requirements.
13. Why Personalization?
“...it works well the advertisements are annoying though I am not a fan of
mainstream music so hearing about pop bands is also driving me crazy”
“Great way to listen to whatever music you want. The ads can be really
annoying though since they don't seem to be targeted. I HATE rap music, yet I
seem to get a lot of ads for it.”
30. Kafka
● Kafka is a distributed, partitioned, replicated commit log service.
● Guarantees
● Kafka provides a total order over messages within a partition
● Fault tolerance : handles N-1 failures for replication factor N.
31. Ad Targeting Architecture V1.0
COTS Data
Infrastructure
Real-time Targeting
Spotify Backend
Infrastructure
32. SSttoorrmm
● Real time stream processing
● Like hadoop without HDFS
● Like Map/Reduce with many reducer steps
● Fault tolerant and guaranteed message processing
36. Apache Crunch
● Framework for writing, testing, and running MapReduce pipelines
● Pipelines are composed of user-defined functions and higher-level
abstractions of common MR tasks (filter, join, etc.)
38. Apache Crunch
What’s wrong with plain Python Streaming MapReduce?
● Testability
● Optimization
● Performance
● IDE support
● Type Safety
● Lack of higher-level operations (filter/join/aggregate)
From Spotify Presentation: Scalding the Crunchy Pig for Cascading into the Hive
39. Apache Crunch
● About a 5x performance improvement over Python streaming MapReduce
● Readable functional-style API in plain Java
● Great local testing support
● First-class support for Avro records.
From Spotify Presentation: Scalding the Crunchy Pig for Cascading into the Hive