SlideShare a Scribd company logo
1 of 40
Product recommendation and
the Dutch movie world
Let’s link on LinkedIn
https://www.linkedin.com/in/longhowlam
Longhow Lam
Freelance data scientist: Just contact me if you need me :-)
 RESTAURANTS ANALYTICS
(RECSYS: ASSOCIATION RULES MINING)
 SELECT APP USERS
(RECSYS: WORD EMBEDDINGS)
 DUTCH FILM WORLD
(GRAPH ANALYSIS)
INTRODUCTION
INTRODUCTION
You need to learn your whole life!
Data science environments and titles, they evolve, come and go
I once was an “applied statistician”....
Applied statistician
Data miner
Data scientist
ML engineer
AI specialist ??
Tool ??
Advanced Restaurant Analytics
RESTAURANT ANALYTICS
Business pain
I have eaten Chinese, OK nice! But where to eat the next time?
Approach
Look at restaurant reviews and look where the other reviewers went
ASSOCIATION
RULES MINING
ALSO CALLED MARKET BASKET ANALYSIS
Identify frequent item sets (rules) in transactional data:
✔ IF items A and B THEN item C {A, B} → {C}
✔ IF items X THEN item Y and Z {X} → {Y, Z}
When is a rule frequent? If the ‘support’ > a threshold
# trxs. {X → Y}
Total # trxs.
Support {X → Y} =
Support
Chips –> Beer 0.823%
Chips –> Milk 0.002%
Lift {X → Y} =
Support {X → Y}
Support (X) * Support(Y)
Lift &
Confidence
Example:
a lift van 8.3 for {Chips} → {Beer} means
If I know someone has already bought Chips then
it is 8.3 more likely that he will also buy beer
Other statistics used to assess the usefulness of a rule
Conf {X → Y} =
Support {X→ Y}
Support (X)
ASSOCIATION
RULES MINING
ALSO CALLED MARKET BASKET ANALYSIS
1993
Question:
When was this
paper published?
Question:
How large was the
database they used?
46.873
transactions
ASSOCIATION
RULES MINING Transaction data is needed
Transaction ID items
0001 [A, B, C]
0002 [A, X, Z, L]
0003 [X, A]
0004 [K, Q, L]
…. ….
N [A, K, M]
customer ID item
0001 A
0001 B
0001 C
0002 A
0002 X
0002 Z
0002 L
…
N A
N K
N M
For classical rules mining the order er of the items is not relevant
Often a time window is chosen
• For example, only transactions of last year (of a customer)
Choose a threshold for support,
 First scan on single items with support > threshold,
 Then construct two item sets with support > threshold,
 Then construct three items sets with support (of every subset) > threshold
 Etc. until you run out
Two major algorithms
See https://athena.ecs.csus.edu/~mei/associationcw/Apriori.html
Item set Support
Butter 0.3
Milk 0.3
Cheese 0.2
Appel 0.15
Pear 0.15
Water 0.001
Item set Support
Butter, Milk 0.25
Milk, Cheese 0.22
Cheese, Appel 0.21
Appel, Pear 0.1
Pear, Butter 0.09
Item set Support
Butter, Milk, cheese 0.2
Milk, Cheese, Pear 0.22
Cheese, Appel, water 0.21
Appel, Pear, Milk 0.03
 Finally Construct rules from the items sets with support > threshold
Apriori, one of the classic algorithms in data mining
Two major algorithms
See https://athena.ecs.csus.edu/~mei/associationcw/Apriori.html
Apriori, one of the classic algorithms in data mining
Major drawbacks
❌ Generation of item sets can be is expensive
(in both space and time)
❌ Support counting can be expensive
Two major algorithms
See http://athena.ecs.csus.edu/~mei/associationcw/FpGrowth.html
FP Growth, more efficient and scalable
Jelly is dropped
Sorted Frequent item list
[ B, P, M, E ]
 Sort single items first on descending support,
 Drop items with support < threshold
 Make a sorted F-list of remaining items
Original transactions of customers
Trx ID Items bought
1 [ Banana, Jelly, Pork ]
2 [ Banana, Pork ]
3 [ Banana, Milk, Pork ]
4 [ Eggs, Banana ]
5 [ Eggs , Milk ]
item support
Banana 4 (80%)
Pork 3 (60%)
Milk 2 (40%)
Eggs 2 (40%)
Jelly 1 (20%
Two major algorithms
See http://athena.ecs.csus.edu/~mei/associationcw/FpGrowth.html
FP Growth, more efficient and scalable
 Sort items in the transactions based on the previous created F-list
 Scan trough your transactions to form a Frequent Pattern Tree
 Create the rules by looking at sub trees of the FP-Tree
sorted transactions of customers First transaction second transaction
All transactions
Trx ID Sorted items
1 [ Banana, Pork ]
2 [ Banana, Pork ]
3 [ Banana, Pork, Milk ]
4 [ Banana, Eggs]
5 [ Milk, Eggs ]
IENS RESTAURANT ASSOCIATION RULES MINING / MARKET BASKET ANALYSE
In Python use mlxtend package
from mlxtend.frequent_patterns import fpgrowth
fpgrowth(df, min_support = 0.0020)
IENS RESTAURANT LENGTH TWO RULES A → B
Interactieve netwerk
Very generic rules
Lift is not really high
IENS RESTAURANT LENGTH THREE RULES A, B → C
Interactief plaatje
Much more specific, higher lift
Often support and lift are trade-offs
IENS RESTAURANT VIRTUAL ITEMS: MAKE IT EVEN MORE PERSONAL
Transaction data with customers and items
klant ITEM
1 A
1 X
2 A
2 B
2 C
3 E
3 T
4 S
possible rules
{ A, B } → { C }
{ X } → { Z }
Add customer features as virtual items
possible rules
{ Male, (18, 25], A, B } → { C }
{ Female, (40,45], X } → { Z }
klant ITEM
1 A
1 X
1 Male
1 (18, 25]
2 A
2 B
2 C
2 Male
2 (45, 65]
3 E
3 T
3 Male
4 (30, 35]
4 S
4 Male
4 (30, 35]
A FEW FACTS… IENS DATA (TRADITIONAL BI)
Most occuring restaurant name (39 times)
Among Dutch
restaurants (6 keer)
% Sustainable kitchens
Biological (67%)
French (58%)
Fish (44%)
Vegetarian (39%)
…
…
…
Chinese (3%)
700 reviews on a “normal” Satuday
Valentine 2015 had 1200 reviews (1.7 times)
23 times
12 times
SELECT RELEVANT
APP USERS
SELECT CERTAIN APP USERS
BUSINESS ISSUE
Which of my app users should I select that are ‘interested’ in SLIPPERS?
APPROACH
Use word-embeddings to map each user-id and article number to a (high-dimensional) vector
AVAILABLE DATA
App session and event data
-----------------------------------
|user_id |time |product_viewed |
-----------------------------------
| A | 2 | AX1234 |
| A | 3 | AW3456 |
| A | 4 | XY1234 |
| B | 1 | PO2345 |
| B | 2 | ZX3214 |
| C | 3 | KL1234 |
| .. | .. | ... |
| .. | .. | ... |
-----------------------------------
Word2vec Methodology
DATA PREP on SPARK because of the size:
 Filter out “non-interesting” events
 Aggregate the data on user level
 Put articles viewed on the app in a list
 Put the user id in ‘the middle’
 So each user has its own ‘document’ or text with article numbers and his id as the tokens (words) in the text.
 Now the data is small enough to handle in ‘normal’ python.
------------------------------------------------------------
| id | text |
------------------------------------------------------------
| A | [ EE5499, FX8912, A, FW4567, AB3499 ] |
| B | [ HP9823, B] |
| C | [ AB9812, PO1299, UK6712, AW9912, SE8932, C.....] |
| D | [ OK3423, SZ8676, D, LK9712] |
------------------------------------------------------------
Predict the target word w(t) with surrounding words w(t-1), w(t-2),… and w(t+1), w(t+2),….
The so-called Continuous Bag of Words (CBOW) model
 we are not interested in the prediction
 the weights we get per word in the vocabulary is what we want
[ Steffy from Germany is laughing very loud and is happy ]
0.123
0.672
0.123
⋮
⋮
0.452
0.512
Word2vec Methodology
WORD EMBEDDINGS
[ w(t-4) w(t-3) w(t-2) w(t-1) w(t) w(t+1) w(t+2) w(t+3) w(t+4) w(t+5) ]
Normal text / document
[ AB54321, CY3461, AW97541, USER_ID, PX91234, KL70123 ]
𝟎. 𝟏𝟐𝟑
𝟎. 𝟔𝟕𝟐
𝟎. 𝟏𝟐𝟑
⋮
⋮
𝟎. 𝟒𝟓𝟐
𝟎. 𝟓𝟏𝟐
Word2vec Methodology
User app sessions
0.253
0.727
0.513
⋮
⋮
0.952
0.318
0.253
0.527
0.714
⋮
⋮
0.612
0.219
Now the ‘texts’ or ‘documents’ are just collections of article ID’s and user ID’s
Predict the target word with surrounding words with so-called
Continuous Bag of Words (CBOW)
Word2vec Methodology
Every product and app user_id is now a
high dimensional embedding.
We can use UMAP to project onto 2D or 3D
space for visualization
So, every dot is in the scatterplot is either
a product or a user
PRODUCTS & APP USERS
Articles are also high dimensional embeddings in
the same space.
So we can calculate distances (or similarities)
Adidas BAG ARTICLES
PRODUCTS & APP USERS
PRODUCTS & APP USERS
SPORTS BRA ARTICLES
PRODUCTS & APP USERS
FOOTBAL BOOTS
Streamlit app
For the marketeer.
 Enter an article number: The Adidas slipper!
 The closest vectors are displayed
 Those vectors are split:
 Articles
 Users
Easy python package to create
simple interactive dashboard
Streamlit app
Another example:
 Enter an article number, say DY2562
 The closest vectors are displayed
 Those vectors are split:
 Articles
 Users
Easy python package to create
simple interactive dashboard
The Dutch movie world
in a graph
You know nothing about Dutch Actors
and Actresses. But you want to know:
“Who is playing with who in a movie?”
GRAPH BASICS
Node or Vertex a point in the network
✔ can have different attributes
✔ i.e., different color or size of nodes)
Edge or Link a relation between two nodes
✔ can be directional and have attributes
✔ i.e., arrowed, colored and sized
A FEW BASIC TERMS
A B
C
D
E
F
G
GRAPH BASICS
Node Centrality How central is a node
* Degree (number of connections)
* Betweenness (number of shortest paths through a node)
* Eigencentrality (Google’s page rank is a version of this)
Community detection Are there nodes that belong together?
A FEW TERMS
5 6 7
4
3
2
1
8
9
1
0
1
1
1
2
Degree 5 Degree 6
Degree 2
3
a
Degree 2
Node 6 and 3 have the
same Degree,
But node 6 has a higher
Betweennes than node 3
WWW.IMDB.COM INTERNET MOVIE DATABSE
Download movie data:
✔ Dutch movies in the last 25 years
✔ Per movie we know the cast and crew
✔ A node is a persoon
✔ Node X links with node Y if X and Y were in the same movie
In R use the library iGraph and in Python the package networkx
DUTCH MOVIE WORLD IN A NETWORK GRAPH
Interactive graph
## create graph
visNetwork(nodes, edges)
Node_1 Node_2 Attr_1 Attr_2
Chantal Jantzen Stef Tijding 12 A
Hans de wolf Jeroen Krabee 3 A
Johan Nijenhuis Frans van Gestel 5 B
… …..
node_id Attr_1 Attr_2
Chantal Jantzen Actress 45
Hans de Wolf Writer 65
Rutger Hauer Actor 73
…. ….. ….
Data frame nodes
Data frame edges
CENTRALITY
COMMUNITIES There are 1257 persons
They are divided in 191 community's
Take community 6:
54 persons in a wordcloud (Centrality based)
Thanks for your time! Questions?
Need me as Freelancer? Let’s have a cup of coffee
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com/
@longhowlam

More Related Content

Similar to Xomia_20220602.pptx

2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation enginelucenerevolution
 
Learning content - Data Science Basics
Learning content - Data Science Basics Learning content - Data Science Basics
Learning content - Data Science Basics PredicSis
 
Big Data for Small Businesses & Startups
Big Data for Small Businesses & StartupsBig Data for Small Businesses & Startups
Big Data for Small Businesses & StartupsFujio Turner
 
It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.Alex Powers
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiHBaseCon
 
From Vision Statement to Product Backlog
From Vision Statement to Product BacklogFrom Vision Statement to Product Backlog
From Vision Statement to Product BacklogLuiz C. Parzianello
 
The Power of Declarative Analytics
The Power of Declarative AnalyticsThe Power of Declarative Analytics
The Power of Declarative AnalyticsYunyao Li
 
Biz Nova It Project Bonus Slides
Biz Nova It Project Bonus SlidesBiz Nova It Project Bonus Slides
Biz Nova It Project Bonus SlidesTyHowardPMP
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsNeo4j
 
Production Readiness Strategies in an Automated World
Production Readiness Strategies in an Automated WorldProduction Readiness Strategies in an Automated World
Production Readiness Strategies in an Automated WorldSean Chittenden
 
Working With Big Data
Working With Big DataWorking With Big Data
Working With Big DataSeth Familian
 
Database Research Principles Revealed
Database Research Principles RevealedDatabase Research Principles Revealed
Database Research Principles Revealedinfoblog
 
Ppc keywords discovery search labs sao paulo 2010 pavel dolezal
Ppc keywords discovery   search labs sao paulo 2010 pavel dolezalPpc keywords discovery   search labs sao paulo 2010 pavel dolezal
Ppc keywords discovery search labs sao paulo 2010 pavel dolezalPavel Dolezal
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsGiulio Carducci
 
6.3 (other) system design tools vezzoli 10-11 (65)
6.3 (other) system design tools vezzoli 10-11 (65)6.3 (other) system design tools vezzoli 10-11 (65)
6.3 (other) system design tools vezzoli 10-11 (65)LeNS_slide
 

Similar to Xomia_20220602.pptx (20)

2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
 
Learning content - Data Science Basics
Learning content - Data Science Basics Learning content - Data Science Basics
Learning content - Data Science Basics
 
Big Data for Small Businesses & Startups
Big Data for Small Businesses & StartupsBig Data for Small Businesses & Startups
Big Data for Small Businesses & Startups
 
It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.It's Not You. It's Your Data Model.
It's Not You. It's Your Data Model.
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Lean 6sigma and DMAIC
Lean 6sigma and DMAICLean 6sigma and DMAIC
Lean 6sigma and DMAIC
 
Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
Piano rubyslava final
Piano rubyslava finalPiano rubyslava final
Piano rubyslava final
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
 
From Vision Statement to Product Backlog
From Vision Statement to Product BacklogFrom Vision Statement to Product Backlog
From Vision Statement to Product Backlog
 
The Power of Declarative Analytics
The Power of Declarative AnalyticsThe Power of Declarative Analytics
The Power of Declarative Analytics
 
Biz Nova It Project Bonus Slides
Biz Nova It Project Bonus SlidesBiz Nova It Project Bonus Slides
Biz Nova It Project Bonus Slides
 
ML基本からResNetまで
ML基本からResNetまでML基本からResNetまで
ML基本からResNetまで
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply Chains
 
Production Readiness Strategies in an Automated World
Production Readiness Strategies in an Automated WorldProduction Readiness Strategies in an Automated World
Production Readiness Strategies in an Automated World
 
Working With Big Data
Working With Big DataWorking With Big Data
Working With Big Data
 
Database Research Principles Revealed
Database Research Principles RevealedDatabase Research Principles Revealed
Database Research Principles Revealed
 
Ppc keywords discovery search labs sao paulo 2010 pavel dolezal
Ppc keywords discovery   search labs sao paulo 2010 pavel dolezalPpc keywords discovery   search labs sao paulo 2010 pavel dolezal
Ppc keywords discovery search labs sao paulo 2010 pavel dolezal
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
 
6.3 (other) system design tools vezzoli 10-11 (65)
6.3 (other) system design tools vezzoli 10-11 (65)6.3 (other) system design tools vezzoli 10-11 (65)
6.3 (other) system design tools vezzoli 10-11 (65)
 

More from Longhow Lam

A Unifying theory for blockchain and AI
A Unifying theory for blockchain and AIA Unifying theory for blockchain and AI
A Unifying theory for blockchain and AILonghow Lam
 
Data science inspiratie_sessie
Data science inspiratie_sessieData science inspiratie_sessie
Data science inspiratie_sessieLonghow Lam
 
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensJaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensLonghow Lam
 
text2vec SatRDay Amsterdam
text2vec SatRDay Amsterdamtext2vec SatRDay Amsterdam
text2vec SatRDay AmsterdamLonghow Lam
 
Dataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamDataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamLonghow Lam
 
Data science in action
Data science in actionData science in action
Data science in actionLonghow Lam
 
MasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalyticsMasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalyticsLonghow Lam
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 
Latent transwarp neural networks
Latent transwarp neural networksLatent transwarp neural networks
Latent transwarp neural networksLonghow Lam
 
MathPaperPublished
MathPaperPublishedMathPaperPublished
MathPaperPublishedLonghow Lam
 
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshareLonghow Lam
 
Parameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelParameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelLonghow Lam
 
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival dataLonghow Lam
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Longhow Lam
 

More from Longhow Lam (14)

A Unifying theory for blockchain and AI
A Unifying theory for blockchain and AIA Unifying theory for blockchain and AI
A Unifying theory for blockchain and AI
 
Data science inspiratie_sessie
Data science inspiratie_sessieData science inspiratie_sessie
Data science inspiratie_sessie
 
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensJaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
 
text2vec SatRDay Amsterdam
text2vec SatRDay Amsterdamtext2vec SatRDay Amsterdam
text2vec SatRDay Amsterdam
 
Dataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamDataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 Amsterdam
 
Data science in action
Data science in actionData science in action
Data science in action
 
MasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalyticsMasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalytics
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Latent transwarp neural networks
Latent transwarp neural networksLatent transwarp neural networks
Latent transwarp neural networks
 
MathPaperPublished
MathPaperPublishedMathPaperPublished
MathPaperPublished
 
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshare
 
Parameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelParameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov model
 
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival data
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
 

Recently uploaded

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Recently uploaded (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Xomia_20220602.pptx

  • 1. Product recommendation and the Dutch movie world Let’s link on LinkedIn https://www.linkedin.com/in/longhowlam Longhow Lam Freelance data scientist: Just contact me if you need me :-)
  • 2.  RESTAURANTS ANALYTICS (RECSYS: ASSOCIATION RULES MINING)  SELECT APP USERS (RECSYS: WORD EMBEDDINGS)  DUTCH FILM WORLD (GRAPH ANALYSIS) INTRODUCTION
  • 3. INTRODUCTION You need to learn your whole life! Data science environments and titles, they evolve, come and go I once was an “applied statistician”.... Applied statistician Data miner Data scientist ML engineer AI specialist ?? Tool ??
  • 5. RESTAURANT ANALYTICS Business pain I have eaten Chinese, OK nice! But where to eat the next time? Approach Look at restaurant reviews and look where the other reviewers went
  • 6. ASSOCIATION RULES MINING ALSO CALLED MARKET BASKET ANALYSIS Identify frequent item sets (rules) in transactional data: ✔ IF items A and B THEN item C {A, B} → {C} ✔ IF items X THEN item Y and Z {X} → {Y, Z} When is a rule frequent? If the ‘support’ > a threshold # trxs. {X → Y} Total # trxs. Support {X → Y} = Support Chips –> Beer 0.823% Chips –> Milk 0.002%
  • 7. Lift {X → Y} = Support {X → Y} Support (X) * Support(Y) Lift & Confidence Example: a lift van 8.3 for {Chips} → {Beer} means If I know someone has already bought Chips then it is 8.3 more likely that he will also buy beer Other statistics used to assess the usefulness of a rule Conf {X → Y} = Support {X→ Y} Support (X) ASSOCIATION RULES MINING ALSO CALLED MARKET BASKET ANALYSIS
  • 8. 1993 Question: When was this paper published? Question: How large was the database they used? 46.873 transactions
  • 9. ASSOCIATION RULES MINING Transaction data is needed Transaction ID items 0001 [A, B, C] 0002 [A, X, Z, L] 0003 [X, A] 0004 [K, Q, L] …. …. N [A, K, M] customer ID item 0001 A 0001 B 0001 C 0002 A 0002 X 0002 Z 0002 L … N A N K N M For classical rules mining the order er of the items is not relevant Often a time window is chosen • For example, only transactions of last year (of a customer)
  • 10. Choose a threshold for support,  First scan on single items with support > threshold,  Then construct two item sets with support > threshold,  Then construct three items sets with support (of every subset) > threshold  Etc. until you run out Two major algorithms See https://athena.ecs.csus.edu/~mei/associationcw/Apriori.html Item set Support Butter 0.3 Milk 0.3 Cheese 0.2 Appel 0.15 Pear 0.15 Water 0.001 Item set Support Butter, Milk 0.25 Milk, Cheese 0.22 Cheese, Appel 0.21 Appel, Pear 0.1 Pear, Butter 0.09 Item set Support Butter, Milk, cheese 0.2 Milk, Cheese, Pear 0.22 Cheese, Appel, water 0.21 Appel, Pear, Milk 0.03  Finally Construct rules from the items sets with support > threshold Apriori, one of the classic algorithms in data mining
  • 11. Two major algorithms See https://athena.ecs.csus.edu/~mei/associationcw/Apriori.html Apriori, one of the classic algorithms in data mining Major drawbacks ❌ Generation of item sets can be is expensive (in both space and time) ❌ Support counting can be expensive
  • 12. Two major algorithms See http://athena.ecs.csus.edu/~mei/associationcw/FpGrowth.html FP Growth, more efficient and scalable Jelly is dropped Sorted Frequent item list [ B, P, M, E ]  Sort single items first on descending support,  Drop items with support < threshold  Make a sorted F-list of remaining items Original transactions of customers Trx ID Items bought 1 [ Banana, Jelly, Pork ] 2 [ Banana, Pork ] 3 [ Banana, Milk, Pork ] 4 [ Eggs, Banana ] 5 [ Eggs , Milk ] item support Banana 4 (80%) Pork 3 (60%) Milk 2 (40%) Eggs 2 (40%) Jelly 1 (20%
  • 13. Two major algorithms See http://athena.ecs.csus.edu/~mei/associationcw/FpGrowth.html FP Growth, more efficient and scalable  Sort items in the transactions based on the previous created F-list  Scan trough your transactions to form a Frequent Pattern Tree  Create the rules by looking at sub trees of the FP-Tree sorted transactions of customers First transaction second transaction All transactions Trx ID Sorted items 1 [ Banana, Pork ] 2 [ Banana, Pork ] 3 [ Banana, Pork, Milk ] 4 [ Banana, Eggs] 5 [ Milk, Eggs ]
  • 14. IENS RESTAURANT ASSOCIATION RULES MINING / MARKET BASKET ANALYSE In Python use mlxtend package from mlxtend.frequent_patterns import fpgrowth fpgrowth(df, min_support = 0.0020)
  • 15. IENS RESTAURANT LENGTH TWO RULES A → B Interactieve netwerk Very generic rules Lift is not really high
  • 16.
  • 17. IENS RESTAURANT LENGTH THREE RULES A, B → C Interactief plaatje Much more specific, higher lift
  • 18. Often support and lift are trade-offs
  • 19. IENS RESTAURANT VIRTUAL ITEMS: MAKE IT EVEN MORE PERSONAL Transaction data with customers and items klant ITEM 1 A 1 X 2 A 2 B 2 C 3 E 3 T 4 S possible rules { A, B } → { C } { X } → { Z } Add customer features as virtual items possible rules { Male, (18, 25], A, B } → { C } { Female, (40,45], X } → { Z } klant ITEM 1 A 1 X 1 Male 1 (18, 25] 2 A 2 B 2 C 2 Male 2 (45, 65] 3 E 3 T 3 Male 4 (30, 35] 4 S 4 Male 4 (30, 35]
  • 20. A FEW FACTS… IENS DATA (TRADITIONAL BI) Most occuring restaurant name (39 times) Among Dutch restaurants (6 keer) % Sustainable kitchens Biological (67%) French (58%) Fish (44%) Vegetarian (39%) … … … Chinese (3%) 700 reviews on a “normal” Satuday Valentine 2015 had 1200 reviews (1.7 times) 23 times 12 times
  • 22. SELECT CERTAIN APP USERS BUSINESS ISSUE Which of my app users should I select that are ‘interested’ in SLIPPERS? APPROACH Use word-embeddings to map each user-id and article number to a (high-dimensional) vector AVAILABLE DATA App session and event data ----------------------------------- |user_id |time |product_viewed | ----------------------------------- | A | 2 | AX1234 | | A | 3 | AW3456 | | A | 4 | XY1234 | | B | 1 | PO2345 | | B | 2 | ZX3214 | | C | 3 | KL1234 | | .. | .. | ... | | .. | .. | ... | -----------------------------------
  • 23. Word2vec Methodology DATA PREP on SPARK because of the size:  Filter out “non-interesting” events  Aggregate the data on user level  Put articles viewed on the app in a list  Put the user id in ‘the middle’  So each user has its own ‘document’ or text with article numbers and his id as the tokens (words) in the text.  Now the data is small enough to handle in ‘normal’ python. ------------------------------------------------------------ | id | text | ------------------------------------------------------------ | A | [ EE5499, FX8912, A, FW4567, AB3499 ] | | B | [ HP9823, B] | | C | [ AB9812, PO1299, UK6712, AW9912, SE8932, C.....] | | D | [ OK3423, SZ8676, D, LK9712] | ------------------------------------------------------------
  • 24. Predict the target word w(t) with surrounding words w(t-1), w(t-2),… and w(t+1), w(t+2),…. The so-called Continuous Bag of Words (CBOW) model  we are not interested in the prediction  the weights we get per word in the vocabulary is what we want [ Steffy from Germany is laughing very loud and is happy ] 0.123 0.672 0.123 ⋮ ⋮ 0.452 0.512 Word2vec Methodology WORD EMBEDDINGS [ w(t-4) w(t-3) w(t-2) w(t-1) w(t) w(t+1) w(t+2) w(t+3) w(t+4) w(t+5) ] Normal text / document
  • 25. [ AB54321, CY3461, AW97541, USER_ID, PX91234, KL70123 ] 𝟎. 𝟏𝟐𝟑 𝟎. 𝟔𝟕𝟐 𝟎. 𝟏𝟐𝟑 ⋮ ⋮ 𝟎. 𝟒𝟓𝟐 𝟎. 𝟓𝟏𝟐 Word2vec Methodology User app sessions 0.253 0.727 0.513 ⋮ ⋮ 0.952 0.318 0.253 0.527 0.714 ⋮ ⋮ 0.612 0.219 Now the ‘texts’ or ‘documents’ are just collections of article ID’s and user ID’s
  • 26. Predict the target word with surrounding words with so-called Continuous Bag of Words (CBOW) Word2vec Methodology
  • 27. Every product and app user_id is now a high dimensional embedding. We can use UMAP to project onto 2D or 3D space for visualization So, every dot is in the scatterplot is either a product or a user PRODUCTS & APP USERS
  • 28. Articles are also high dimensional embeddings in the same space. So we can calculate distances (or similarities) Adidas BAG ARTICLES PRODUCTS & APP USERS
  • 29. PRODUCTS & APP USERS SPORTS BRA ARTICLES
  • 30. PRODUCTS & APP USERS FOOTBAL BOOTS
  • 31. Streamlit app For the marketeer.  Enter an article number: The Adidas slipper!  The closest vectors are displayed  Those vectors are split:  Articles  Users Easy python package to create simple interactive dashboard
  • 32. Streamlit app Another example:  Enter an article number, say DY2562  The closest vectors are displayed  Those vectors are split:  Articles  Users Easy python package to create simple interactive dashboard
  • 33. The Dutch movie world in a graph You know nothing about Dutch Actors and Actresses. But you want to know: “Who is playing with who in a movie?”
  • 34. GRAPH BASICS Node or Vertex a point in the network ✔ can have different attributes ✔ i.e., different color or size of nodes) Edge or Link a relation between two nodes ✔ can be directional and have attributes ✔ i.e., arrowed, colored and sized A FEW BASIC TERMS A B C D E F G
  • 35. GRAPH BASICS Node Centrality How central is a node * Degree (number of connections) * Betweenness (number of shortest paths through a node) * Eigencentrality (Google’s page rank is a version of this) Community detection Are there nodes that belong together? A FEW TERMS 5 6 7 4 3 2 1 8 9 1 0 1 1 1 2 Degree 5 Degree 6 Degree 2 3 a Degree 2 Node 6 and 3 have the same Degree, But node 6 has a higher Betweennes than node 3
  • 36. WWW.IMDB.COM INTERNET MOVIE DATABSE Download movie data: ✔ Dutch movies in the last 25 years ✔ Per movie we know the cast and crew ✔ A node is a persoon ✔ Node X links with node Y if X and Y were in the same movie In R use the library iGraph and in Python the package networkx
  • 37. DUTCH MOVIE WORLD IN A NETWORK GRAPH Interactive graph ## create graph visNetwork(nodes, edges) Node_1 Node_2 Attr_1 Attr_2 Chantal Jantzen Stef Tijding 12 A Hans de wolf Jeroen Krabee 3 A Johan Nijenhuis Frans van Gestel 5 B … ….. node_id Attr_1 Attr_2 Chantal Jantzen Actress 45 Hans de Wolf Writer 65 Rutger Hauer Actor 73 …. ….. …. Data frame nodes Data frame edges
  • 39. COMMUNITIES There are 1257 persons They are divided in 191 community's Take community 6: 54 persons in a wordcloud (Centrality based)
  • 40. Thanks for your time! Questions? Need me as Freelancer? Let’s have a cup of coffee https://www.linkedin.com/in/longhowlam https://longhowlam.wordpress.com/ @longhowlam