2. https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations
H&M Personalized Fashion Recommendations
Personalized product recommendations based on previous purchases
ā articles.csv - detailed metadata for each article_id available for purchase
ā customers.csv - metadata for each customer_id
ā transactions_train.csv - purchases for each customer with date
Predict the article_ids each customer will purchase.
3. Recommender systems are information ļ¬ltering systems that personalize
the information coming to a user based on her historical interests, the
relevance of the information, and the current context (e.g., what is trending).
Some paradigms for recommender systems:
ā Collaborative ļ¬ltering
ā Content-based ļ¬ltering
ā Social and demographic recommenders
ā Contextual recommendation algorithms
Recommender Systems
4. Facebook posts
Batch
Collaborative ļ¬ltering
Content-based
Netļ¬ix movies
Batch
Collaborative ļ¬ltering
Content-based
Alibaba products
Real-time
Item-to-Item
TikTok
Real-time
Retrieval and Ranking
Freshest features
Spotify Weekly
Batch
Collaborative ļ¬ltering
Content-based
MovieLens
Batch
Collaborative ļ¬ltering
Content-based
Youtube clips
Real-time
Retrieval and Ranking
U2I and I2I
āIt's digital crackā, Andrej Karpathy
Analytical/Batch Recommendation Systems
5. Database or
KV-Store
Batch Program
(with model) Model Registry
download model
Operational
Services
Reports
consume predictions
store predictions
read batch data for scoring
Feature
Store
Analytical (Batch) Recommendation Service
6. Facebook posts
Batch
Collaborative ļ¬ltering
Content-based
Netļ¬ix movies
Batch
Collaborative ļ¬ltering
Content-based
Alibaba products
Real-time
Item-to-Item
TikTok
Real-time
Retrieval and Ranking
Freshest features
Spotify Weekly
Batch
Collaborative ļ¬ltering
Content-based
MovieLens
Batch
Collaborative ļ¬ltering
Content-based
Youtube clips
Real-time
Retrieval and Ranking
U2I and I2I
āIt's digital crackā, Andrej Karpathy
Operational/Online Recommendation Systems
7. Business
Value
Analytical ML
Operational ML
with historical data
Operational ML
with real-time data
BI:
DESCRIPTIVE & DIAGNOSTIC
ANALYTICS
AI:
PREDICTIVE & PRESCRIPTIVE
ANALYTICS
Artisanal ML
(Laptop ML)
Structured
Data
āBigā Data
Increase Business Value with more Real-Time Recommendations
8. User History & Context
Item 2
millions
or
billions
Hundreds Dozens
Candidate
Generation
Ranking
Item1
Item 3
Item 4
Item 5
Item
Corpus
Online Recommendations: Retrieval and Ranking Architecture
9. 3-Stage Retrieval, Filtering, Ranking
Personalized Music
Discovery Playlist:
ā Retrieval: Nearest neighbor search in
item embedding space
ā Filtering: Remove tracks user has
heard before (e.g. w/ Bloom Filters)
ā Ranking: Re-use embedding space
distance from and tade off between
score, similarity and BPm to reduce
jarring track transitions
Social Media Feed:
ā Retrieval: Random walks on social
graph to find new items in userās
network
ā Filtering: Remove posts from muted or
blocked users
ā Ranking: Predict userās likelihood of
interacting with posts, but ātwiddleā the
list so adjacent posts are from different
authors
Examples by Nvidia - https://www.youtube.com/watch?v=5qjiY-kLwFY
10. 3-Stage Retrieval, Filtering, Ranking
eCommerce āAdd to Cartā:
ā Retrieval: Look up items commonly
co-purchased with cart contents
ā Filtering: Remove candidate items that are
currently out of stock (or already in your
cart)
ā Ranking: Predict how likely to buy each
candidate item the user is, but reorder to
maximize expected revenue
Social Media Feed:
ā Retrieval: Many candidate sources for the
various rows/shelves/banners
ā Filtering: Remove items that arenāt
licensed for streaming in userās country
ā Ranking: Predict userās stream time for
each item, but arrange a set of shelves
that trade off between predicted relevance
and matching the genre distribution of the
userās previous consumption
Examples by Nvidia - https://www.youtube.com/watch?v=5qjiY-kLwFY
12. (Two-Tower) Operational Recommendation Service
Feature Store
history+context
Model Serving
Model
Deployments
user-embedding/rank
Recommend me stuff!
Data Sources
Model Registry
feature
pipelines
deploy
Vector Database
Embeddings
similarity search
13. User/query embedding
model - user history and
context.
Item embedding model
Built Approx Nearest
Neighbor (ANN) Index with
items and item embedding
model.
User/Query and Item
Embeddings
With the ranking model,
score all the candidate
items with both user and
item features, ensuring,
candidate diversity.
Ranking
Remove candidate items
for various reasons:
ā¢ underage user
ā¢ item sold out
ā¢ item bought
before
ā¢ item not available
in userās region
Filtering
Retrieve candidate items
based on the user
embedding from the ANN
Index
Retrieval
Create Embeddings - then Retrieval, Filtering, Ranking
14. Retrieval and Ranking for Fashion Recommendations
User/query
Embedding
Retrieve closest
candidates with
similarity search
Filter & Enrich
with features for
candidates
Add customer
prior purchases,
age, location, etc
Ranking
Model
Ranked
candidates
Candidate 1
Candidate 2
Candidate N
less than 100ms
15. User/query
Embedding
Retrieve closest
candidates with
similarity search
Filter & Enrich
with features for
candidates
Add customer
prior purchases,
age, location, etc
Ranking
Model
Hopsworks
Feature
Store
KServe
OpenSearch
(FAISS)
KServe
Ranked
candidates
Candidate 1
Candidate 2
Candidate N
Online Infrastructure
Online Infrastructure for Retrieval and Ranking
Hopsworks
Feature
Store
18. ā An embedding is a mapping of a discrete ā categorical ā variable to a
vector of continuous numbers: [1.119, 4.341, ā¦.,1.334]
ā Create a denser representation of the categories and maintain some of
the implicit relationship information between items
Word2Vec Embeddings
Image Embeddings enable Similarity Search
Embeddings
19. Can a āuser queryā ļ¬nd āitemsā
with similarity search?
Yes, by mapping the āuser queryā embedding
into the āitemā embedding space.
Representation learning for retrieval usually involves supervised learning with labeled
or pseudo-labeled data from user-item interactions.
Two-Tower Embedding Model
20. [Image Credit: Roman Grebennikov, Findify - https://www.youtube.com/watch?v=BskjQPkrYec ]
Feature Logging - Generate Training Data for Embedding Models
Log user-item interactions as training data for our two-tower model.
H&M Website
Search
Item 1
Item 2
Item 3
Item 4
Purchase 3
Click 2
Click 3
Score: 0
Item 1
Score: 1
Item 2
Score: 5
Item 3
Score: 0
Item 4
Features
Features
Features
Features
21. How do we create Training Data for our Embedding Models?
Implicit Feedback vs Explicit Feedback
Imagine the user sees a feed and selects the ļ¬fth item in
the list. This shows us that the ļ¬rst four items should
have been ranked lower in the list. We can train our ranker
model with this sort of data.
You can also train a model with explicit feedback by
introducing a user feedback/rating system in your
application.
Negative examples should also be used when training
- items labeled "irrelevant" to a given query.
23. User/Query Embedding
Trainable (neural network) Encoder
User proļ¬le features:
languages, location
preferences, age, etc
(From Database/DW)
User Search History
User Click History
(From Event Logs or Data Lake)
The User/Query Embedding Model
24. Sigmoid, Dot Product, ā¦
User and Item Embeddings
are the same length Item Embedding
Trainable (neural network) Encoder
Static item
features
User/Query Embedding
Trainable (neural network) Encoder
Static user
features
User Session
History
User Click
History
TensorFlow has the tensorļ¬ow-recommenders library to train two-tower embedding models.
Our training data, transactions.csv, consists of customer and article pairs. You need to provide only positive pairs, where the
customer purchased an article. Training produces 2 models: an item encoder model and a user encoder model.
The dot product of a user embedding and item
embedding pair that interacted is high and the
dot product of a non interacting pair is close to 0
Training the Query and Item Embedding Models with Two Towers
27. A vector database (or embedding store): a
database for durably storing embeddings and
supporting similarity search (nearest neighbor
search)
Approximate nearest neighbor (ANN) algorithms
provide fast nearest neighbor search, O(log(n))
ā Facebookās FAISS uses Hierarchical
Navigable Small World Graphs (HNSW)
ā Google ScaNN
HNSW Graph Search
Vector Database
28. PUT my-knn-index-1
{
"settings": {
"index": {
"knn": true,
"knn.space_type": "cosinesimil"
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 2
},
"my_vector2": {
"type": "knn_vector",
"dimension": 4 }
]}}
https://opensearch.org/docs/latest/search-plugins/knn
Create a k-NN nearest neighbour Index in OpenSearch. In the following
slides, we will add entries to the index. Then query it with approximate
nearest neighbor search. Then query it with additional ļ¬lters.
OpenSearch k-NN: Create Index
30. GET my-knn-index-1/_search
{
"size": 2,
"query": {
"knn": {
"my_vector2": {
"vector": [2, 3, 5, 6],
"k": 2
}
}
}
}
k is the number of neighbors the search of each graph will return.
The size option indicates how many results the query actually
returns.
OpenSearch k-NN: Similarity search using Index
31. GET my-knn-index-1/_search
{
"size": 2,
"query": {
"knn": {
"my_vector2": {
"vector": [2, 3, 5, 6],
"k": 2
}
}
},
"post_filter": {
"range": {
"price": {
"gte": 5,
"lte": 10
}
}
}
}
We are filtering results so that we only get candidates with a price
between 5-10.
OpenSearch k-NN: Similarity search using Index with ļ¬ltering
33. Each instance (user-item pair)
is represented with a list of
features, retrieved from the
feature store.
Training data is the user-item
features and the label is the
relevance ratings.
Ranking models should be fast
- low latency to rank 100s of
candidates, so decision trees
are popular.
What does a Ranking Model do?
34. The ranking stage should consider additional criteria or constraints.
If the system always recommend items that are "closest" to the user/query embedding, the
candidates tend to be very similar to each other.
Re-rank items based on genre or other metadata to ensure diversity.
When ranking, we can include features that were not feasible during candidate generation.
Use the feature store to retrieve features such as user persona (e.g., demographics, price propensity),
item metadata (e.g., attributes, engagement statistics), cross features (e.g., interaction between
each feature pair), and media embeddings.
Building a ranking model with TF Recommenders: https://www.youtube.com/watch?v=ZkwJo5HRjiQ
https://www.tensorļ¬ow.org/recommenders/examples/basic_ranking
Ranking Models - reordering and the feature store
36. Feature helps generate candidates in ANN search
Static/Dynamic Filter Candidates.
Book 1
Book 2
Book N
Sci-Fi
Drama
Non-Fiction
Vector DB
Ranking Stage
OR
Static attributes. Filter ANN results in OpenSearch
Book 1 Feature1 Feature2 ā¦ TAG
Tags, Metadata - Reļ¬ning your Search by Filtering
38. Oļ¬ine Infrastructure for Ranking and Retrieval
Hopsworks
Feature
Store
KServe
OpenSearch
(FAISS)
KServe
Kafka
transactions
Feature Engineering
(Spark/Python/SQL)
Data Warehouse user,
item profiles
Historical transactions
User/query and Item
Embedding Models
Model Training
Compute embeddings
for items/users and
add to ANN index
Spark App
Model Training
Ranking Model
deploy
deploy
build index
40. What are the System Properties of Recommendation Systems?
Availability
(uptime)
Latency
(response time)
Feature
Freshness
(age)
Throughput
(speed)
Apart from the accuracy of your ML models, you have to worry aboutā¦
41. Why should I care about Performance and Availability ?
Availability: Embedded in-memory DB crashes on Black
Friday at e-retailer, produces random recommendations
Freshness: Video streaming service switches from
Spark Streaming to Flink, improving feature
freshness from 10s down to 2s, improving the
engagement rate.
Latency: A real-time recommendation service for a
music streaming company returns 250 candidates
during retrieval, then it needs 250 lookups from the
feature store in < 30ms for ranking.
Throughput: Online retail store produces increasing
number of sales, needs to switch feature pipeline
from Pandas to PySpark.
42. Throughput: Online retail store produces increasing
number of sales, needs to switch feature pipeline
from Pandas to PySpark.
Why should I care about Performance and Availability ?
Availability: Embedded in-memory DB crashes on Black
Friday at e-retailer, produces random recommendations.
Freshness: Video streaming service switches from
Spark Streaming to Flink, improving feature
freshness from 10s down to 2s, improving the
engagement rate.
Latency: A real-time recommendation service for a
music streaming company returns 250 candidates
during retrieval, then it needs 250 lookups from the
feature store in < 30ms for ranking.
43. Throughput: Online retail store produces increasing
number of sales, needs to switch feature pipeline
from Pandas to PySpark.
Why should I care about Performance and Availability ?
Freshness: Video streaming service switches from
Spark Streaming to Flink, improving feature
freshness from 10s down to 2s, improving the
engagement rate.
Latency: A real-time recommendation service for a
music streaming company returns 250 candidates
during retrieval, then it needs 250 lookups from the
feature store in < 30ms for ranking.
Availability: Embedded in-memory DB crashes on Black
Friday at e-retailer, produces random recommendations.
44. Availability: Embedded in-memory DB crashes on Black
Friday at e-retailer, produces random recommendations.
Why should I care about Performance and Availability ?
Freshness: Video streaming service switches from
Spark Streaming to Flink, improving feature
freshness from 10s down to 2s, improving the
engagement rate.
Latency: A real-time recommendation service for a
music streaming company returns 250 candidates
during retrieval, then it needs 250 lookups from the
feature store in < 30ms for ranking.
Throughput: Online retail store produces increasing
number of sales, needs to switch feature pipeline
from Pandas to PySpark.
53. Feature Store
HOPSWORKS
Feature Pipeline
Precomputed Features
(Context, Historical)
Training Data
Retrieval and
Ranking
Application
(Click or Search)
Online API
Ofļ¬ine API
Hopsworks for Retrieval and Ranking
Model
Registry
OpenSearch
ANN
KServe
(Model Serving)
54. a. User Query
1. Enrich Query with
User History, Proļ¬le, and
Context
Hopsworks
Feature Store
OpenSearch
2. Compute
User/Query
Embedding
3. Retrieve Candidates
4. Enrich Candidates
using Feature Store
5. Score Candidates
with Ranking Model
KServe
Query,
Req-ID
Ranked
Results
<ReqID,
outcome>
Hopsworks Ranking and Retrieval Service
Feature Logging
b. Item Selected
56. Goal:
Support Spotify Personalized Search in a Retrieval
and Ranking Architecture.
Benchmark the highest throughput, lowest latency
key-value stores to identify one that could scale to
handle millions of concurrent lookups per second
on Spotifyās workloads.
Systems:
Aerospike and RonDB were identiļ¬ed as the only
systems capable of meeting the triple goals of
High Throughput, Low Latency, and High
Availability. Other databases such as Redis,
Cassandra, BigTable were not considered for
availability or latency or throughput reasons.
Feature Store
Ranking
and
Retrieval
Lookup Features
Feature Values
Feature Pipelines
Benchmarking Online Feature Stores
57. Hardware Benchmark Setup on GCP: RonDB (NDB) vs Aerospike. The Java Client
nodes are the clients performing the reads/writes on the Data Nodes. When the
cluster is provisioned with 8 RonDB (NDB) data nodes, it has 832GB of usable
in-memory storage, when a replication factor of 2 is used.
Benchmark Setup
58. Throughput: higher is better. Each feature store operation was a batch of 250 key-value lookups, meaning with 8
clients for a 8-node RonDB cluster, there are >2m lookups/sec.
Average throughput of the clients in both 6 and 8 node setup of RonDB
and Aerospike
N. of clients
RonDB 8 Nodes
RonDB 6 Nodes
Aerospike 6 nodes
Aerospike 8 nodes
1250
1000
750
500
250
1 2 4 8
Ops/Sec
Benchmark Results - Throughput
59. Latency: lower is better. Each feature store operation was a batch of 250 key-value lookups. So, for RonDB, the P99 when
performing 250 primary operations in a single transaction is under 30ms.
Average latency of the clients in both 6 node setup of RonDB and Aerospike
N. of clients
RonDB P75
RonDB P99 Aerospike P99
Aerospike P75
50
40
30
20
10
0
1 2 4 8
Latency
(ms)
Benchmark Results - Latency
60. RonDB 35% Higher Throughput
RonDB 30% Better Latency
Based on Public Report from Spotify
comparing Aerospike and RonDB (NDB Cluster) as Feature Stores
http://kth.diva-portal.org/smash/get/diva2:1556387/FULLTEXT01.pdf
Benchmark Results
62. Todayās Workshop: Retrieval and Ranking Platform in Python
create feature groups for articles, customers, transactions
ā articles.csv
ā customers.csv
ā transactions_train.csv
create feature view for retrieval model (training data)
build opensearch KNN index with embeddings for all articles
train two-tower model - user and article embedding models
create feature view for retrieval model (training data)
train ranking model
deploy models to KServe + glue code for Hopsworks, OpenSearch
64. OpenSearch kNN
ANN Index
Python Program
with Item
Embedding Model
Item Corpus
Encode all items
Insert all pairs
(ID, embedding)
3_build_index.ipynb
Model Registry
Customer Embedding Model
Article Embedding Model
TensorFlow
Two-Tower
Training
Training Data
Register User & Item
Embedding Models
2b_train_retrieval_model.ipynb
68. more data -> better models -> more users -> more data->
ā¦
Feature
Store
Recommende
r
(Website, etc)
Models &
Embedding
s
Model
Training
Feature
Pipeline &
Monitoring
Feature
Store
Enterprise
Data
logs
Online API Ofļ¬ine API
Build out your Data-for-AI Flywheel
70. Hopsworks is the platform for ML collaboration,
powered by the leading
Python-centric Enterprise Feature Store.
London
IDEALondon,
69 Wilson St.
London
UK
Silicon Valley
470 Ramona St
Palo Alto
California
USA
ā Stockholm
Hopsworks AB
Medborgarplatsen 25
118 72 Stockholm
Sweden