SlideShare a Scribd company logo
1 of 26
Download to read offline
Social Book Search:
             A Combination of Personalized
             Recommendations and Retrieval

Author:
Justin van Wees
Supervisor:                                      Master Thesis
Marijn Koolen                              Information Science
Second assessor:                    Human Centered Multimedia
Frank Nack                                     August 23, 2012
Outline

1. Background
2. Research questions
3. Data collection
4. Experiments and results
5. Conclusions
6. Discussion and future work
7. Questions
Current situation

•   Traditional information retrieval (IR) models:

    •   developed for use on small collections

    •   contain only officially published documents, annotated
        by professionals

•   Many modern web (2.0) applications still use traditional
    models for search

•   Millions of documents

•   Combination of user-generated content (UDG) and
    professional metadata
Current situation

•   User uses IR system to find those documents that are
    topically relevant to her information need

•   Queries can lead to thousands of relevant documents

•   Evaluating large number of results expensive for user

•   Other notions of relevance, i.e. how well-written,
    popular, recent, fun is the document

•   Combination of professional and user-generated
    metadata
Social Book Search Track


•   Evaluate relative value of controlled book metadata
    versus social metadata (Koolen et al., 2012)

•   Amazon.com and LibraryThing (LT) corpus

•   ~2.8 million book records, both social and professional
    metadata

•   book search requests from LT discussion forums as
    topics, suggestions by other users as relevance
    judgements
Recommender Systems

•   Recommender Systems (RSs) suggest items of interest
    to individuals or groups of users (Resnick and Varian, 1997)

•   Assumes that individual’s taste or interest in a particular
    item can be explained by features recorded by the RS
    (demographics, previous interactions, etcetera)

•   Different strategies: collaborative filtering (CF),
    content-, community-, knowledge-based, hybrid (Burke,
    2007)


•   Differs from traditional retrieval in terms of query
    formulation, source of relevance feedback and
    personalization (Furner, 2002)
Research Questions
Does a combination of techniques from the field of IR with
those from RSs improve retrieval performance when searching
for works in a large scale on-line collaborative media
catalogue?

•   What data are we able to collect?

•   Can we automatically make accurate predictions of a
    user’s preference for an unknown book?

•   How do we combine results from IR system with RSs?

•   Social Book Search scenario and data
Crawling LibraryThing

   •    Perform four different crawls of user profiles and
        personal catalogues

   •    For each crawl, also crawl links to other profiles

   •    Compare crawls to determine representativeness for
        entire LT user-base

   •    All crawl combined approximately 6% of LT userbase
Crawl                   seed list   profiles   unique works   profile overlap
Forum users             1,104       60,131    4,354,387
Random – 211 works      1,306       8,040     2,537,065      7,048
Random – 1,000 works    5,577       18,381    3,580,296      14,262
Random – 10,000 works   35,671      64,379    5,122,848      37,300
Total                   -           89,693    5,299,399      -
Crawling LibraryThing
   Crawl                    min.   max.    median   mean    std. dev.
   Forum users
    Friends                 0      172     3.0      8.47    16.31
    Groups                  0      10      9.0      6.79    3.74
    Interesting Libraries   0      510     2.0      11.19   26.46
   Random – 211 works
    Friends                 0      79      0.0      2.61    7.46
    Groups                  0      10      0.0      1.70    3.05
    Interesting Libraries   0      394     0.0      3.30    17.80
   Random – 1,000 works
    Friends                 0      84      0.0      2.18    6.07
    Groups                  0      10      0.0      1.64    3.02
    Interesting Libraries   0      574     0.0      2.74    14.41
   Random – 10,000 works
    Friends                 0      2,858   0.0      1.73    17.49
    Groups                  0      10      0.0      1.24    2.61
    Interesting Libraries   0      855     0.0      1.69    10,40
   Total
    Friends                 0      2,858   1.0      2.14    12.77
    Groups                  0      10      0.0      1.18    2.44
    Interesting Libraries   0      855     0.0      1.27    8.00
Crawling LibraryThing
 Crawl                min.   max.     median   mean       std. dev.   sum
 Forum users
  Unrated             0      28,402   84.0     397.22     929.70      23,885,23
  Rated               0      12,190   3.0      78.80      238.53      4,738,018
  Total               0      28,402   148.00   476.02     980.88      28,623,249
 Random – 211 works
  Unrated             0      28,402   458.00   1,112.81   1,835.81    8,946,997
  Rated               0      12,190   10.00    182.77     472.08      1,469,531
  Total               0      28,402   657.00   1,295.58   1,908.65    10,416,528
 Random – 1,000 works
  Unrated             0      28,402   331.00   864.32     1,480.98    15,887,025
  Rated               0      12,190   3.00     130.20     369.06      2,393,233
  Total               0      28,402   475.00   994.52     1539.15     18,280,258
 Random – 10,000 works
  Unrated              0     28,402   163.00   486.63     955.86      31,328,971
  Rated               0      12,190   1.00     74.04      237.01      4,766,750
  Total                0     28,402   201.00   560.68     1,000.50    36,095,721
 Total
  Unrated             0      28,402   102.00   378.18     834.94      33,920,353
  Rated               0      12,190   1.00     62.85      206.40      5,637,097
  Total               0      28,402   156.00   441.03     876.76      39,557,450
Generating Recommendations



•   Collaborative filtering approach

•   Unary and rated transactions

•   Memory- and model-based recommenders

•   Randomly split transactions (80% train/20% test) for
    performance evaluation
Generating Recommendations


•   Neighbourhood (Desrosiers and Karypis, 2011):

    •   Directly use user-item ratings to predict ratings for
        ‘unseen’ items

    •   Find n most similar neighbours (Pearson correlation)

    •   Use the weighted average rating given by the user’s
        neighbours

    •   Let neighbours ‘vote’ on unary transactions
Generating Recommendations


•   Singular Value Decomposition (SVD) (Schafer et al., 2007):

    •   Reduce domain complexity by mapping item space to
        k dimensions

    •   Remaining dimensions represent the latent topics:
        preferences classes of users, categorical classes of
        items

    •   Currently considered ‘state of the art’
Recommender Performance
Method                   MAE       RMSE        P@5         P@10        P@50
Neighbourhood (N=25)     0.7813    1.0286      0.0712      0.0661      0.0614
Neighbourhood (N=50)     0.7721    1.0105      0.0376      0.0371      0.0339
Neighbourhood (N=100)    0.7633    0.9927      0.0246      0.0239      0.0232
SVD (K=50)               0.6210    0.8139      0.0021      0.0019      0.0026
SVD (K=100)              0.6203    0.8131      0.0025      0.0022      0.0028
SVD (K=150)              0.6192    0.8122      0.0281      0.0107      0.0030




 Method                   Accuracy    P@5         P@10        P@50
 Neighbourhood (N=25)     0.2430      0.3711      0.2425      0.1829
 Neighbourhood (N=50)     0.3014      0.3824      0.2561      0.1861
 Neighbourhood (N=100)    0.3621      0.3640      0.2422      0.1812
 SVD (K=50)               0.2240      0.0214      0.0198      0.0216
 SVD (K=100)              0.2601      0.0219      0.0203      0.0229
 SVD (K=150)              0.2676      0.0424      0.0212      0.0234
Retrieving Works

•   Setup used for INEX 2012; top performing run

•   Index consists of user-generated content

•   Removed stopwords

•   Stemming with Krovetz

•   Topic titles as queries

•   Language model

•   Pseudo relevance feedback, 50 terms of top 10 results
Combining IR and RS

•   Retrieval system: ranked list, probability score between
    0 and 1 per work

•   Recommendations: estimated preference of user for
    work between 0.5 and 5.0 or 0 or 1 (unary)

•   Normalise ratings

•   ‘Boost’ works with estimated preference, CombSUM
    (Fox and Shaw, 19994)


•   Use average rating when no prediction can be made

•   Introduce weight (λ) between systems
Results

Method                       nDCG@10           P@10              R@10
Baseline         -           0.1437            0.1219            0.1494
Neighbourhood
 Rated (n=25)    0.0001700   0.1709 (18.93%)   0.1490 (22.23%)   0.1899 (27.11%)
 Rated (n=50)    0.0001855   0.1778 (23.73%)   0.1500 (23.05%)   0.1913 (28.05%)
 Rated (n=100)   0.0001800   0.1669 (16.14%)   0.1490 (22.23%)   0.1878 (25.70%)
 Unary (n=25)    0.0001500   0.1446 (0.63%)    0.1229 (0.82%)    0.1520 (1.74%)
 Unary (n=50)    0.0001500   0.1441 (0.28%)    0.1229 (0.82%)    0.152 (1.74%)
 Unary (n=100)   0.0001500   0.1441 (0.28%)    0.1229 (0.82%)    0.152 (1.74%)
SVD
 Rated (K=50)    0.0001800   0.1718 (19.55%)   0.149 (22.23%)    0.1866   (24.9%)
 Rated (K=100)   0.0001850   0.1721 (19.76%)   0.149 (22.23%)    0.1866   (24.9%)
 Rated (K=150)   0.0001850   0.172 (19.69%)    0.149 (22.23%)    0.1866   (24.90%)
 Unary (K=50)    0.0001500   0.1449 (0.84%)    0.124 (1.72%)     0.1541   (3.15%)
 Unary (K=100)   0.0001550   0.1441 (0.28%)    0.1229 (0.82%)    0.1520   (1.74%)
 Unary (K=150)   0.0001550   0.1424 (-0.9%)    0.1250 (2.54%)    0.1561   (4.48%)
Conclusions

•   Collected representative sample of user profiles

•   Collaborative filtering obvious choice

•   SVD best at estimating rated preference

•   Poor performance on unary transactions

•   Successfully combined retrieval with personalized
    recommendations

•   Rated transactions most useful

•   Personal preference is relevance evidence that can
    highly improve retrieval performance in SBS
Discussion and Future Work


•   Popularity as relevance evidence

•   Value of λ depending on IR score distribution

•   Other (mixtures of) RS setups

•   Scaling, cold-start problems

•   Trust and transparency of the system
Questions?
References
• R. Burke. Hybrid web recommender systems. In The adaptive web, pages 377–408. Springer-Verlag, 2007.

• C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation methods. Recommender
 Systems Handbook, pages 107–144, 2011.
• E. Fox and J. Shaw. Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pages 243–243, 1994.
• J. Furner. On recommending. Journal of the American Society for Information Science and Technology, 53(9):747–763, 2002.

• M. Koolen, G. Kazai, J. Kamps, A. Doucet, and M. Landoni. Overview of the INEX 2011 books and social search track. In S. Geva, J.
 Kamps, and R. Schenkel, editors, Focused Retrieval of Content and Structure: 10th International Workshop of the Initiative for the
 Evaluation of XML Retrieval (INEX 2011), volume 7424 of LNCS. Springer, 2012.

• P. Resnick and H.Varian. Recommender systems. Communi- cations of the ACM, 40(3):56–58, 1997.

• J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. Collaborative Filtering Recommender Systems. Inter- national Journal of Electronic
 Business, 2(1):77, 2007. ISSN 14706067. doi: 10.1504/IJEB.2004.004560. URL http://www.springerlink.com/index/
 t87386742n752843.pdf.
Number of Books in Catalogue




    (a) Unrated works   (b) Rated works
Document scoring

     S(d) = (1              )PRet (d|q) + PCF (d)


• PRet (d|q): work’s score obtained through IR system
• PCF : estimated rating of current user for work obtained through RS
•   : weight between systems
Estimating preference (rated)
                               P
                                       wuv rvi
                            v2Ni (u)
                  rui =
                  ˆ             P
                                        |wuv |
                             v2Ni (u)



• rui : estimated preference of user u for item i
  ˆ
• wuv : preference similarity between users v and u
• Ni (u): k-NN of u that rated item i




                                                    Desrosiers and Karypis, 2011
Estimating preference (unary)
                  X
        vir =              (rvi = r)Wuv
                v2Ni (u)




                                      Desrosiers and Karypis, 2011

More Related Content

Similar to Social Book Search: A Combination of Personalized Recommendations and Retrieval

Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryGiuseppe Rizzo
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksDatabricks
 
A Fast Content-Based Image Retrieval Method Using Deep Visual Features
A Fast Content-Based Image Retrieval Method Using Deep Visual FeaturesA Fast Content-Based Image Retrieval Method Using Deep Visual Features
A Fast Content-Based Image Retrieval Method Using Deep Visual FeaturesHiroki Tanioka
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with RAkira Murakami
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmHadi Fadlallah
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4Khadija Atiya
 
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Spark Summit
 
Receiver Operating Characteristic (ROC) curve analysis. 19.12
Receiver Operating Characteristic (ROC) curve analysis. 19.12Receiver Operating Characteristic (ROC) curve analysis. 19.12
Receiver Operating Characteristic (ROC) curve analysis. 19.12Kenisha S Russell Jonsson
 
Don't Go There! Providing Discovery Services Locally, not at a Vendor's Site
Don't Go There! Providing Discovery Services Locally, not at a Vendor's SiteDon't Go There! Providing Discovery Services Locally, not at a Vendor's Site
Don't Go There! Providing Discovery Services Locally, not at a Vendor's SiteKen Varnum
 
Technology Tools for Making Use/Sense of Your Inventory
Technology Tools for Making Use/Sense of Your InventoryTechnology Tools for Making Use/Sense of Your Inventory
Technology Tools for Making Use/Sense of Your InventoryKelaine Vargas Ravdin
 
Get Competitive with Driverless AI
Get Competitive with Driverless AIGet Competitive with Driverless AI
Get Competitive with Driverless AISri Ambati
 
Mm3 project ppt group 1_section a
Mm3 project ppt group 1_section aMm3 project ppt group 1_section a
Mm3 project ppt group 1_section aAbhijeet Dash
 

Similar to Social Book Search: A Combination of Personalized Recommendations and Retrieval (20)

Terminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom DiscoveryTerminological cluster trees for Disjointness Axiom Discovery
Terminological cluster trees for Disjointness Axiom Discovery
 
Tinderbook
Tinderbook  Tinderbook
Tinderbook
 
Discussants
DiscussantsDiscussants
Discussants
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
A Fast Content-Based Image Retrieval Method Using Deep Visual Features
A Fast Content-Based Image Retrieval Method Using Deep Visual FeaturesA Fast Content-Based Image Retrieval Method Using Deep Visual Features
A Fast Content-Based Image Retrieval Method Using Deep Visual Features
 
Clustering Methods with R
Clustering Methods with RClustering Methods with R
Clustering Methods with R
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithm
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
Receiver Operating Characteristic (ROC) curve analysis. 19.12
Receiver Operating Characteristic (ROC) curve analysis. 19.12Receiver Operating Characteristic (ROC) curve analysis. 19.12
Receiver Operating Characteristic (ROC) curve analysis. 19.12
 
Don't Go There! Providing Discovery Services Locally, not at a Vendor's Site
Don't Go There! Providing Discovery Services Locally, not at a Vendor's SiteDon't Go There! Providing Discovery Services Locally, not at a Vendor's Site
Don't Go There! Providing Discovery Services Locally, not at a Vendor's Site
 
Sciences Po presentation eng
Sciences Po presentation engSciences Po presentation eng
Sciences Po presentation eng
 
Weka_ITB
Weka_ITBWeka_ITB
Weka_ITB
 
Weka
WekaWeka
Weka
 
Technology Tools for Making Use/Sense of Your Inventory
Technology Tools for Making Use/Sense of Your InventoryTechnology Tools for Making Use/Sense of Your Inventory
Technology Tools for Making Use/Sense of Your Inventory
 
Get Competitive with Driverless AI
Get Competitive with Driverless AIGet Competitive with Driverless AI
Get Competitive with Driverless AI
 
Mm3 project ppt group 1_section a
Mm3 project ppt group 1_section aMm3 project ppt group 1_section a
Mm3 project ppt group 1_section a
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 

Social Book Search: A Combination of Personalized Recommendations and Retrieval

  • 1. Social Book Search: A Combination of Personalized Recommendations and Retrieval Author: Justin van Wees Supervisor: Master Thesis Marijn Koolen Information Science Second assessor: Human Centered Multimedia Frank Nack August 23, 2012
  • 2. Outline 1. Background 2. Research questions 3. Data collection 4. Experiments and results 5. Conclusions 6. Discussion and future work 7. Questions
  • 3. Current situation • Traditional information retrieval (IR) models: • developed for use on small collections • contain only officially published documents, annotated by professionals • Many modern web (2.0) applications still use traditional models for search • Millions of documents • Combination of user-generated content (UDG) and professional metadata
  • 4. Current situation • User uses IR system to find those documents that are topically relevant to her information need • Queries can lead to thousands of relevant documents • Evaluating large number of results expensive for user • Other notions of relevance, i.e. how well-written, popular, recent, fun is the document • Combination of professional and user-generated metadata
  • 5. Social Book Search Track • Evaluate relative value of controlled book metadata versus social metadata (Koolen et al., 2012) • Amazon.com and LibraryThing (LT) corpus • ~2.8 million book records, both social and professional metadata • book search requests from LT discussion forums as topics, suggestions by other users as relevance judgements
  • 6.
  • 7. Recommender Systems • Recommender Systems (RSs) suggest items of interest to individuals or groups of users (Resnick and Varian, 1997) • Assumes that individual’s taste or interest in a particular item can be explained by features recorded by the RS (demographics, previous interactions, etcetera) • Different strategies: collaborative filtering (CF), content-, community-, knowledge-based, hybrid (Burke, 2007) • Differs from traditional retrieval in terms of query formulation, source of relevance feedback and personalization (Furner, 2002)
  • 8. Research Questions Does a combination of techniques from the field of IR with those from RSs improve retrieval performance when searching for works in a large scale on-line collaborative media catalogue? • What data are we able to collect? • Can we automatically make accurate predictions of a user’s preference for an unknown book? • How do we combine results from IR system with RSs? • Social Book Search scenario and data
  • 9. Crawling LibraryThing • Perform four different crawls of user profiles and personal catalogues • For each crawl, also crawl links to other profiles • Compare crawls to determine representativeness for entire LT user-base • All crawl combined approximately 6% of LT userbase Crawl seed list profiles unique works profile overlap Forum users 1,104 60,131 4,354,387 Random – 211 works 1,306 8,040 2,537,065 7,048 Random – 1,000 works 5,577 18,381 3,580,296 14,262 Random – 10,000 works 35,671 64,379 5,122,848 37,300 Total - 89,693 5,299,399 -
  • 10. Crawling LibraryThing Crawl min. max. median mean std. dev. Forum users Friends 0 172 3.0 8.47 16.31 Groups 0 10 9.0 6.79 3.74 Interesting Libraries 0 510 2.0 11.19 26.46 Random – 211 works Friends 0 79 0.0 2.61 7.46 Groups 0 10 0.0 1.70 3.05 Interesting Libraries 0 394 0.0 3.30 17.80 Random – 1,000 works Friends 0 84 0.0 2.18 6.07 Groups 0 10 0.0 1.64 3.02 Interesting Libraries 0 574 0.0 2.74 14.41 Random – 10,000 works Friends 0 2,858 0.0 1.73 17.49 Groups 0 10 0.0 1.24 2.61 Interesting Libraries 0 855 0.0 1.69 10,40 Total Friends 0 2,858 1.0 2.14 12.77 Groups 0 10 0.0 1.18 2.44 Interesting Libraries 0 855 0.0 1.27 8.00
  • 11. Crawling LibraryThing Crawl min. max. median mean std. dev. sum Forum users Unrated 0 28,402 84.0 397.22 929.70 23,885,23 Rated 0 12,190 3.0 78.80 238.53 4,738,018 Total 0 28,402 148.00 476.02 980.88 28,623,249 Random – 211 works Unrated 0 28,402 458.00 1,112.81 1,835.81 8,946,997 Rated 0 12,190 10.00 182.77 472.08 1,469,531 Total 0 28,402 657.00 1,295.58 1,908.65 10,416,528 Random – 1,000 works Unrated 0 28,402 331.00 864.32 1,480.98 15,887,025 Rated 0 12,190 3.00 130.20 369.06 2,393,233 Total 0 28,402 475.00 994.52 1539.15 18,280,258 Random – 10,000 works Unrated 0 28,402 163.00 486.63 955.86 31,328,971 Rated 0 12,190 1.00 74.04 237.01 4,766,750 Total 0 28,402 201.00 560.68 1,000.50 36,095,721 Total Unrated 0 28,402 102.00 378.18 834.94 33,920,353 Rated 0 12,190 1.00 62.85 206.40 5,637,097 Total 0 28,402 156.00 441.03 876.76 39,557,450
  • 12. Generating Recommendations • Collaborative filtering approach • Unary and rated transactions • Memory- and model-based recommenders • Randomly split transactions (80% train/20% test) for performance evaluation
  • 13. Generating Recommendations • Neighbourhood (Desrosiers and Karypis, 2011): • Directly use user-item ratings to predict ratings for ‘unseen’ items • Find n most similar neighbours (Pearson correlation) • Use the weighted average rating given by the user’s neighbours • Let neighbours ‘vote’ on unary transactions
  • 14. Generating Recommendations • Singular Value Decomposition (SVD) (Schafer et al., 2007): • Reduce domain complexity by mapping item space to k dimensions • Remaining dimensions represent the latent topics: preferences classes of users, categorical classes of items • Currently considered ‘state of the art’
  • 15. Recommender Performance Method MAE RMSE P@5 P@10 P@50 Neighbourhood (N=25) 0.7813 1.0286 0.0712 0.0661 0.0614 Neighbourhood (N=50) 0.7721 1.0105 0.0376 0.0371 0.0339 Neighbourhood (N=100) 0.7633 0.9927 0.0246 0.0239 0.0232 SVD (K=50) 0.6210 0.8139 0.0021 0.0019 0.0026 SVD (K=100) 0.6203 0.8131 0.0025 0.0022 0.0028 SVD (K=150) 0.6192 0.8122 0.0281 0.0107 0.0030 Method Accuracy P@5 P@10 P@50 Neighbourhood (N=25) 0.2430 0.3711 0.2425 0.1829 Neighbourhood (N=50) 0.3014 0.3824 0.2561 0.1861 Neighbourhood (N=100) 0.3621 0.3640 0.2422 0.1812 SVD (K=50) 0.2240 0.0214 0.0198 0.0216 SVD (K=100) 0.2601 0.0219 0.0203 0.0229 SVD (K=150) 0.2676 0.0424 0.0212 0.0234
  • 16. Retrieving Works • Setup used for INEX 2012; top performing run • Index consists of user-generated content • Removed stopwords • Stemming with Krovetz • Topic titles as queries • Language model • Pseudo relevance feedback, 50 terms of top 10 results
  • 17. Combining IR and RS • Retrieval system: ranked list, probability score between 0 and 1 per work • Recommendations: estimated preference of user for work between 0.5 and 5.0 or 0 or 1 (unary) • Normalise ratings • ‘Boost’ works with estimated preference, CombSUM (Fox and Shaw, 19994) • Use average rating when no prediction can be made • Introduce weight (λ) between systems
  • 18. Results Method nDCG@10 P@10 R@10 Baseline - 0.1437 0.1219 0.1494 Neighbourhood Rated (n=25) 0.0001700 0.1709 (18.93%) 0.1490 (22.23%) 0.1899 (27.11%) Rated (n=50) 0.0001855 0.1778 (23.73%) 0.1500 (23.05%) 0.1913 (28.05%) Rated (n=100) 0.0001800 0.1669 (16.14%) 0.1490 (22.23%) 0.1878 (25.70%) Unary (n=25) 0.0001500 0.1446 (0.63%) 0.1229 (0.82%) 0.1520 (1.74%) Unary (n=50) 0.0001500 0.1441 (0.28%) 0.1229 (0.82%) 0.152 (1.74%) Unary (n=100) 0.0001500 0.1441 (0.28%) 0.1229 (0.82%) 0.152 (1.74%) SVD Rated (K=50) 0.0001800 0.1718 (19.55%) 0.149 (22.23%) 0.1866 (24.9%) Rated (K=100) 0.0001850 0.1721 (19.76%) 0.149 (22.23%) 0.1866 (24.9%) Rated (K=150) 0.0001850 0.172 (19.69%) 0.149 (22.23%) 0.1866 (24.90%) Unary (K=50) 0.0001500 0.1449 (0.84%) 0.124 (1.72%) 0.1541 (3.15%) Unary (K=100) 0.0001550 0.1441 (0.28%) 0.1229 (0.82%) 0.1520 (1.74%) Unary (K=150) 0.0001550 0.1424 (-0.9%) 0.1250 (2.54%) 0.1561 (4.48%)
  • 19. Conclusions • Collected representative sample of user profiles • Collaborative filtering obvious choice • SVD best at estimating rated preference • Poor performance on unary transactions • Successfully combined retrieval with personalized recommendations • Rated transactions most useful • Personal preference is relevance evidence that can highly improve retrieval performance in SBS
  • 20. Discussion and Future Work • Popularity as relevance evidence • Value of λ depending on IR score distribution • Other (mixtures of) RS setups • Scaling, cold-start problems • Trust and transparency of the system
  • 22. References • R. Burke. Hybrid web recommender systems. In The adaptive web, pages 377–408. Springer-Verlag, 2007. • C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation methods. Recommender Systems Handbook, pages 107–144, 2011. • E. Fox and J. Shaw. Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pages 243–243, 1994. • J. Furner. On recommending. Journal of the American Society for Information Science and Technology, 53(9):747–763, 2002. • M. Koolen, G. Kazai, J. Kamps, A. Doucet, and M. Landoni. Overview of the INEX 2011 books and social search track. In S. Geva, J. Kamps, and R. Schenkel, editors, Focused Retrieval of Content and Structure: 10th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2011), volume 7424 of LNCS. Springer, 2012. • P. Resnick and H.Varian. Recommender systems. Communi- cations of the ACM, 40(3):56–58, 1997. • J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. Collaborative Filtering Recommender Systems. Inter- national Journal of Electronic Business, 2(1):77, 2007. ISSN 14706067. doi: 10.1504/IJEB.2004.004560. URL http://www.springerlink.com/index/ t87386742n752843.pdf.
  • 23. Number of Books in Catalogue (a) Unrated works (b) Rated works
  • 24. Document scoring S(d) = (1 )PRet (d|q) + PCF (d) • PRet (d|q): work’s score obtained through IR system • PCF : estimated rating of current user for work obtained through RS • : weight between systems
  • 25. Estimating preference (rated) P wuv rvi v2Ni (u) rui = ˆ P |wuv | v2Ni (u) • rui : estimated preference of user u for item i ˆ • wuv : preference similarity between users v and u • Ni (u): k-NN of u that rated item i Desrosiers and Karypis, 2011
  • 26. Estimating preference (unary) X vir = (rvi = r)Wuv v2Ni (u) Desrosiers and Karypis, 2011