7. User-Generated Content
• Encyclopedia to
Wikipedia
–Launched in 2001
–Largest and fastest growing,
and most popular reference
work
•News Services to
Blogosphere
•Books to FanFiction
11. Collective Intelligence
• Wisdom of Crowds:
Communities create
value!
• Community of authors
produce valuable content
• Critical mass of
participation act as
filtering what is valuable
• The web of connections
grows organically as an
output of the collective
activity of all web users
12. The Weak Link: Participation
• Community based Systems share many issues,
which should be addressed
to produces successful creators
1
systems
10
• Participation vs lurking Synthesizers
• Social capital
• Social networking
100
• Trust and reputation consumers
• Privacy and presence
13. One of 100? One of 500?
University of Pittsburgh - PAWS Lab 13
9/26/2010
15. Diminishing Returns
• 307,006,550: US Population
• 10,000,000: Watched the movie (1:30)
• 20,000: Rated the movie in IMDB (1:15,000)
• 238: Wrote a review (1:1,000,000)
• 54: Rated the movie in MovieLens (1:5,000,000)
University of Pittsburgh - PAWS Lab 15
16. Social Systems for Small Communities?
• Sharing cultural events in Pittsburgh?
– Post event, rate event, write a review
– One of many systems presenting events
– 334563 people, 143739 households, and 74169
families
– Expected ratings (1:5,000,000)?
• Sharing research talks at CMU and Pitt?
– The one and the only system of this kind…
– Expected posts (1:1,000,000)?
– Expected bookmarks (1:15,000)?
University of Pittsburgh - PAWS Lab 16
22. The Plan
• Personalization
– Recommender service
– Social navigation
– Adaptive engagement
• Mobile and Ubiquitous
– Android application
– Facebook connection (a sidewalk sale)
– Twitter feed
– Public displays
22
23. Where we Are?
• Personalization
– Simple content-based recommender in CoMeT and
CN3
– Offered in navigation support mode
• Mobile and Ubiquitous
– First Eventur app (search for Eventur in the Android
market)
– EventurFacebook export
– EventurTwitter feed
23
25. Personalization Challenge
• Events: Short living artifacts
• Need everything that can work
• Content-based recommendation
• Collaborative recommendation
• Social recommendation
• Demographic and group-based recommendation
• Case-Based (Metadata-based) recommendation
University of Pittsburgh - PAWS Lab 25
26. Personalization for Engagement
• Adaptive engagement efforts
– Based on user knowledge/goals/interests
– Based on user past experience with the system
• Special efforts to deal with cold start: Using
information from other social systems
– Social bookmarking systems (CiteULike, Delicious)
– Social linking systems (Facebook, LinkedIn)
– Public data (i.e., Google Scholar)
• HetRec 2011 workshop!
University of Pittsburgh - PAWS Lab 26
27. Recommendation Approaches
• Various sources of information:
– Standard information: Keywords of bookmarked talks
in CoMeT
– Keywords of bookmarked papers from CiteULike
– Tags of talks in CoMeT
– Tags of papers in CiteULike (CUL)
• Different models for fusion of tags and keywords
University of Pittsburgh - PAWS Lab 27
9/26/2010
28. Document Representation Models
• Keywords Only (KO)
– Keywords extracted from documents’ titles and abstracts
• Keywords+n*Tags (KnT)
– Keywords extracted from documents’ titles and abstracts +
tags assigned to documents
• Keywords Concatenated by Tags (KCT)
– Keywords extracted from documents’ titles and abstracts +
tags assigned to documents
University of Pittsburgh - PAWS Lab 28
9/26/2010
29. Keywords Only (KO) Model
• Each document:
– a bag of words
– represented as a vector in keywords vector space
– TF.IDF weightening scheme
Keywords
W W W W W W
1 2 3 4 5 6
D1 0 1 0 0 0 0
D2 .5 0 0 .5 0 0
Talks/Papers
D3 .12 .13 0 .25 .5 0
D4 .25 0 .25 0 .25 .25
University of Pittsburgh - PAWS Lab 29
9/26/2010
30. Merging CUL and CoMeT Data in KO
Model D: Merged Documents’
Matrix
Dc: CUL Papers’
W1 w2 W3 W4 w5
Matrix Dt: CoMeT Talks’
w1 w w3 w4 Matrix T1 0 0 0 1 0
2
W W w5 K T2 0 0 0 0 .5
3 4
k P1 1 0 0 0
+ P1 1 0 0 0 0
P2 .25 0 .5 .25
e T1 0 1 0
e P2 .25 0 .5 .25 0
t2 0 0 .5
P3 0 .5 .25 .25 P3 0 .5 .25 .25 0
m
l l+m-o
k- the number of CiteULike papers
l- the number of keywords used in CiteULike papers
e- total number of talks in CoMeT
m- total number of keywords in CoMeT
o- the number of common keywords between two CoMeT and CiteULike systems
30
University of Pittsburgh - PAWS Lab
9/26/2010
31. Keywords+n*Tags (KnT) Model
• Each document: a bag of words containing :
– document’s abstract, title and tags
• Tags: regular keywords
– Each tag appears n times
• Merge CUL and CoMeT data in this model: same as KO
Common
Tag
Keywords Keywords & Tags
s
D3
W3 W4
W3=T1 W1 W2 T3 T4
/T1 /T2
W4=T2
Keywords:
w1, w2, w3, w2 n=2 D1 0 1 1 0 0 0
D2 1 0 3 5 0 0
Tags: Talks/Papers
T1, T3 D3 1 2 3 0 1 0
D4 2 0 5 0 2 1
University of Pittsburgh - PAWS Lab 31
9/26/2010
32. Keywords Concatenated by Tags (KCT)
Model
• Tags: a separated source of information
• Each document: a bag of keywords and a bag of
tags
– Concatenating keywords and tags vectors
– TF.IDF weightening scheme Keywords
Tags
D3
W1 W2 W3 W4 T1 T2 T3 T4
Keywords: W3=T1
w1, w2, w3, w2 W4=T2 D1 0 1 1 0 0 0 0 0
Talks/Papers D2 1 0 3 1 0 2 0 0
Tags:
T1, T3 D3 1 2 1 0 1 0 1 0
D4 2 3 3 0 1 0 2 1
University of Pittsburgh - PAWS Lab 32
9/26/2010
33. Merging CUL and CoMeT Data in KCT
Model D: Merged Documents’ Matrix
W1 w2 W3 T1 T2
Dc: CUL Papers’ Matrix Dt: CoMeT Talks’ Matrix
w1 w T1 T2 C1 0 0 1 0 0
2
W W T1 K C2 0 0 0 .5 0
P1 1 0 0 0 2 3
+ P1 1 0 0 0 0
k P2 .25 0 .5 .25
e C1 0 1 0
e P2 .25 0 0 .5 .25
C2 0 0 .5
P3 0 .5 .25 .25 P3 0 .5 0 .25 .25
m+i l+j
l+m+i+j-o-p
k- the number of CiteULike papers
m- the number of keywords used in CiteULike papers
i- the number of tags used in CiteULike papers
e- total number of talks in CoMeT
l- total number of keywords in CoMeT
j- total number of tags in CoMeT
o- the number of common keywords between two CoMeT and CiteULike systems
P- the number of common tags between two CoMeT and CiteULike systems
University of Pittsburgh - PAWS Lab 33
9/26/2010
34. Recommending Talks to Users
• K-nearest neighbor method
– recommend top K closest documents to user profile
• User profiles: based on users’ bookmarked and rated
talks and papers
UP: User Profiles
U: User Profiles in D: Documents in in Keywords Space
Talks/Papers Space Keywords Space
w1 w w3
W W w3
D1 D D3 D4 2
1 2
2
U1 1 0 1
D1 0 1 0
U1 1 0 0 0
D2 0 0 .5 user U2 .25 0. .37
user U2 .25 0 .5 .25 s 5
s D3 0 1 0 U3 0 .2 .37
U3 0 .5 .25 .25 5
D4 0 0 .5
Keywords
Documents Keywords
University of Pittsburgh - PAWS Lab 34
9/26/2010
35. Experimental Results
• User study:
– 8 real users of both CoMeT and CiteULike systems
• Evaluation questionnaire for each recommended
talk:
– Is this talk related to your interest? (yes/no question)
– How interesting this talk to you? (in 5-point scale)
– If the talk is related to your interests, how novel is this
talk to you? (in 5-step scale)
University of Pittsburgh - PAWS Lab 35
9/26/2010
36. Experimental Results (Cont’d)
• Compared six models:
– KO, KnT (with n = 1, 2,5; best n = 1), and KCT
• using only CoMeT data
• using both, CoMeT and CiteULike
• Measures:
– Relevance: precision by yes/no answers
– Interest: nDCG by 5-point scale
– Novelty: averaged the novelty ratings (Non-relevant =
zero novelty)
University of Pittsburgh - PAWS Lab 36
9/26/2010
37. Precision results for different
number of recommendations
Precision 1 2 3 4 5 6 7 8 9 10
KO 0.83 0.67 0.72 0.63 0.6 0.56 0.57 0.5 0.51 0.51
Only KnT
CoMeT 0.5 0.5 0.58 0.59 0.57 0.58 0.57 0.58 0.6 0.57
Data n=1
KCT 0.5 0.33 0.39 0.46 0.47 0.53 0.52 0.5 0.5 0.53
KO 0.83 0.83 0.67 0.75 0.73 0.69 0.64 0.63 0.56 0.57
CoMeT + KnT
CiteULike 0.63 0.69 0.71 0.72 0.73 0.73 0.71 0.7 0.68 0.67
Data n=1
KCT 0.38 0.44 0.42 0.47 0.48 0.52 0.5 0.49 0.53 0.55
University of Pittsburgh - PAWS Lab 37
9/26/2010
38. Precision results for different
number of recommendations (Cont’d)
• Adding tag using KnT→ better cumulative precision
for top 10 recommendations
• Adding CoMeT data in both KnT and KO → higher
precision
• KnTwith both CoMeT and CUL data → best
cumulative precision
• KCT model → decrease in precision
– High dimensionality of vector space model→ increased
distance of documents and user profiles → decreased
variance between similarities of user profile to different
talks
University of Pittsburgh - PAWS Lab 38
9/26/2010
39. nDCG Results for different number of
recommendations
nDCG 1 2 3 4 5 6 7 8 9 10
KO 0.9 0.88 0.89 0.93 0.92 0.94 0.95 0.95 0.95 0.96
Only KnT
CoMeT 0.9 0.85 0.82 0.83 0.87 0.88 0.89 0.9 0.91 0.93
Data n=1
KCT 0.84 0.88 0.89 0.9 0.9 0.91 0.92 0.92 0.94 0.95
KO 0.84 0.91 0.9 0.92 0.93 0.94 0.95 0.96 0.96 0.96
CoMeT + KnT
CiteULike 0.9 0.9 0.89 0.88 0.9 0.92 0.92 0.94 0.94 0.95
Data n=1
KCT 0.77 0.85 0.84 0.81 0.83 0.84 0.86 0.88 0.91 0.92
University of Pittsburgh - PAWS Lab 39
9/26/2010
40. nDCG Results for different number of
recommendations (Cont’d)
• KCT and KnTmodels: using both CiteULike and
CoMeT data → increased user cumulative
interest
• Best results: tag-less KO model both with and
without CiteULike data
University of Pittsburgh - PAWS Lab 40
9/26/2010
41. Novelty Results for different number of
recommendations
Novelty 1 2 3 4 5 6 7 8 9 10
KO 1.75 1.69 1.67 1.72 1.7 1.65 1.66 1.55 1.49 1.44
Only KnT
CoMeT 1.88 1.75 1.67 1.88 1.88 1.88 2 2.03 1.99 1.93
Data n=1
KCT 2 1.5 1.54 1.56 1.55 1.6 1.63 1.58 1.5 1.5
KO 1.88 1.44 1.33 1.5 1.5 1.52 1.61 1.47 1.44 1.36
CoMeT + KnT
CiteULike 1.75 2.19 1.79 2.06 2.2 2.08 2.02 2.19 2.06 1.96
Data n=1
KCT 1.38 1.31 1.38 1.47 1.58 1.6 1.52 1.47 1.61 1.64
University of Pittsburgh - PAWS Lab 41
9/26/2010
42. Novelty Results for different number of
recommendations (Cont’d)
• Adding tags using KnT fusion model → largest positive impact
• adding different sources of information → improve the
novelty of recommendations
– Tags are provided by users → include a broader range of vocabulary
– Each user tags: describe a document from her point of view
(different from the terms included in the document)
• Adding CUL data in KO model → decreased novelty
– Distinctive natures of CoMeT and CiteULike systems
• CiteULike: adding, reviewing and rating related papers to their research
field
• CoMeT: information about talks happening within a specific time given on a
particular date users bookmark a more novel, less relevant talk
University of Pittsburgh - PAWS Lab 42
9/26/2010
43. Conclusion
• Relevance: a fit to user research work
• Interest: an overall attraction of an item
• Users interested in talks on more general topics
– little in common with their research interests
• Increased focus of relevance encapsulated in
tags → The decrease of system ability to
recommend interestingtalks with the addition of
tags
University of Pittsburgh - PAWS Lab 43
9/26/2010
44. Conclusion (Cont’d)
• Including another reliable user profile → increase precision of
recommendations;
– Considering the way to augment the additional profile
• Using CiteULike data for all models
– Increased Relevancyof every recommended documents
– Various results of interestingness
• Adding tags
– Increased noveltyof recommendations (both using CoMeT and CUL data)
– increased relatednessin larger number of recommendations
• Injection of keywords from another source of data: more reliable
than including tags for relevancy
• Including tags from various sources of information: more reliable
for interestingness or novelty
University of Pittsburgh - PAWS Lab 44
9/26/2010