Social Systems for Smaller Communities

Social Systems
for Smaller
Communities
Peter Brusilovskywith
ChirayuWongchokprasitti
ShaghayeghSahebi, Danielle Lee,
Claudia Lopez, and other PAWS
students

Overview

• The context
• The problem
• The goal
• Work done
• Google Integration

University of Pittsburgh - PAWS Lab 2

Social Systems: the Web of People

http://www.veryweb.it/?page_id=27

Key Elements
• User-Generated • User as a first-class
content participant, contributor,
– Blogs author
– Wikis
• Shared resources
– Video (YouTube)
– Bookmarks
– News
• Secondary content
– Comments
– Ratings
– Tags

http://www.masternewmedia.org/news/2006/12/01/social_bookmarking_services_and_tools.htm

Sharing and Tagging

• Delicious &Flickr
– Pioneered the concept of folksonomy
• Collaborative categorization
using freely chosen keywords
(tags)

Sharing and Tagging: CiteULike


User-Generated Content

• Encyclopedia to
Wikipedia
–Launched in 2001
–Largest and fastest growing,
and most popular reference
work
•News Services to
Blogosphere
•Books to FanFiction

Markets, Feedback, and Trust

– Collective activity of all its users

Voting by Linking - PageRank

– Using the link structure of the web

Collective Intelligence

• Wisdom of Crowds:
Communities create
value!
• Community of authors
produce valuable content
• Critical mass of
participation act as
filtering what is valuable
• The web of connections
grows organically as an
output of the collective
activity of all web users

The Weak Link: Participation

• Community based Systems share many issues,
which should be addressed
to produces successful creators
1
systems
10
• Participation vs lurking Synthesizers
• Social capital
• Social networking
100
• Trust and reputation consumers
• Privacy and presence

One of 100? One of 500?

9/26/2010

One of 100000?


Diminishing Returns

• 307,006,550: US Population
• 10,000,000: Watched the movie (1:30)
• 20,000: Rated the movie in IMDB (1:15,000)
• 238: Wrote a review (1:1,000,000)
• 54: Rated the movie in MovieLens (1:5,000,000)


Social Systems for Small Communities?

• Sharing cultural events in Pittsburgh?
– Post event, rate event, write a review
– One of many systems presenting events
– 334563 people, 143739 households, and 74169
families
– Expected ratings (1:5,000,000)?
• Sharing research talks at CMU and Pitt?
– The one and the only system of this kind…
– Expected posts (1:1,000,000)?
– Expected bookmarks (1:15,000)?

Conference Navigator III


Eventur.us


CoMeT(http://halley.exp.sis.pitt.edu/comet/)


CoMeT: Collaborative Management of
Talks

9/26/2010

The Idea

Social

Ubiquitous Personalized

The Plan
• Personalization
– Recommender service
– Social navigation
– Adaptive engagement

• Mobile and Ubiquitous
– Android application
– Facebook connection (a sidewalk sale)
– Twitter feed
– Public displays
22

Where we Are?
• Personalization
– Simple content-based recommender in CoMeT and
CN3
– Offered in navigation support mode

• Mobile and Ubiquitous
– First Eventur app (search for Eventur in the Android
market)
– EventurFacebook export
– EventurTwitter feed
23

CoMeTNavigation Support

9/26/2010

Personalization Challenge

• Events: Short living artifacts
• Need everything that can work
• Content-based recommendation
• Collaborative recommendation
• Social recommendation
• Demographic and group-based recommendation
• Case-Based (Metadata-based) recommendation

Personalization for Engagement

• Adaptive engagement efforts
– Based on user knowledge/goals/interests
– Based on user past experience with the system
• Special efforts to deal with cold start: Using
information from other social systems
– Social bookmarking systems (CiteULike, Delicious)
– Social linking systems (Facebook, LinkedIn)
– Public data (i.e., Google Scholar)
• HetRec 2011 workshop!

Recommendation Approaches

• Various sources of information:
– Standard information: Keywords of bookmarked talks
in CoMeT
– Keywords of bookmarked papers from CiteULike
– Tags of talks in CoMeT
– Tags of papers in CiteULike (CUL)

• Different models for fusion of tags and keywords

9/26/2010

Document Representation Models

• Keywords Only (KO)
– Keywords extracted from documents’ titles and abstracts

• Keywords+n*Tags (KnT)
– Keywords extracted from documents’ titles and abstracts +
tags assigned to documents

• Keywords Concatenated by Tags (KCT)
– Keywords extracted from documents’ titles and abstracts +
tags assigned to documents

9/26/2010

Keywords Only (KO) Model

• Each document:
– a bag of words
– represented as a vector in keywords vector space
– TF.IDF weightening scheme
Keywords

W W W W W W
1 2 3 4 5 6
D1 0 1 0 0 0 0
D2 .5 0 0 .5 0 0
Talks/Papers
D3 .12 .13 0 .25 .5 0
D4 .25 0 .25 0 .25 .25

9/26/2010

Merging CUL and CoMeT Data in KO
Model D: Merged Documents’
Matrix
Dc: CUL Papers’
W1 w2 W3 W4 w5
Matrix Dt: CoMeT Talks’
w1 w w3 w4 Matrix T1 0 0 0 1 0
2
W W w5 K T2 0 0 0 0 .5
3 4
k P1 1 0 0 0
+ P1 1 0 0 0 0
P2 .25 0 .5 .25
e T1 0 1 0
e P2 .25 0 .5 .25 0
t2 0 0 .5
P3 0 .5 .25 .25 P3 0 .5 .25 .25 0

m
l l+m-o
k- the number of CiteULike papers
l- the number of keywords used in CiteULike papers
e- total number of talks in CoMeT
m- total number of keywords in CoMeT
o- the number of common keywords between two CoMeT and CiteULike systems

30
University of Pittsburgh - PAWS Lab
9/26/2010

Keywords+n*Tags (KnT) Model
• Each document: a bag of words containing :
– document’s abstract, title and tags
• Tags: regular keywords
– Each tag appears n times
• Merge CUL and CoMeT data in this model: same as KO
Common
Tag
Keywords Keywords & Tags
s
D3
W3 W4
W3=T1 W1 W2 T3 T4
/T1 /T2
W4=T2
Keywords:
w1, w2, w3, w2 n=2 D1 0 1 1 0 0 0
D2 1 0 3 5 0 0
Tags: Talks/Papers
T1, T3 D3 1 2 3 0 1 0
D4 2 0 5 0 2 1

9/26/2010

Keywords Concatenated by Tags (KCT)
Model
• Tags: a separated source of information
• Each document: a bag of keywords and a bag of
tags
– Concatenating keywords and tags vectors
– TF.IDF weightening scheme Keywords
Tags

D3

W1 W2 W3 W4 T1 T2 T3 T4

Keywords: W3=T1
w1, w2, w3, w2 W4=T2 D1 0 1 1 0 0 0 0 0

Talks/Papers D2 1 0 3 1 0 2 0 0
Tags:
T1, T3 D3 1 2 1 0 1 0 1 0
D4 2 3 3 0 1 0 2 1

9/26/2010

Merging CUL and CoMeT Data in KCT
Model D: Merged Documents’ Matrix

W1 w2 W3 T1 T2
Dc: CUL Papers’ Matrix Dt: CoMeT Talks’ Matrix
w1 w T1 T2 C1 0 0 1 0 0
2
W W T1 K C2 0 0 0 .5 0
P1 1 0 0 0 2 3
+ P1 1 0 0 0 0
k P2 .25 0 .5 .25
e C1 0 1 0
e P2 .25 0 0 .5 .25
C2 0 0 .5
P3 0 .5 .25 .25 P3 0 .5 0 .25 .25

m+i l+j
l+m+i+j-o-p
k- the number of CiteULike papers
m- the number of keywords used in CiteULike papers
i- the number of tags used in CiteULike papers
e- total number of talks in CoMeT
l- total number of keywords in CoMeT
j- total number of tags in CoMeT
o- the number of common keywords between two CoMeT and CiteULike systems
P- the number of common tags between two CoMeT and CiteULike systems
9/26/2010

Recommending Talks to Users
• K-nearest neighbor method
– recommend top K closest documents to user profile
• User profiles: based on users’ bookmarked and rated
talks and papers
UP: User Profiles
U: User Profiles in D: Documents in in Keywords Space
Talks/Papers Space Keywords Space
w1 w w3
W W w3
D1 D D3 D4 2
1 2
2
U1 1 0 1
D1 0 1 0
U1 1 0 0 0

D2 0 0 .5 user U2 .25 0. .37
user U2 .25 0 .5 .25 s 5
s D3 0 1 0 U3 0 .2 .37
U3 0 .5 .25 .25 5
D4 0 0 .5

Keywords
Documents Keywords

9/26/2010

Experimental Results

• User study:
– 8 real users of both CoMeT and CiteULike systems

• Evaluation questionnaire for each recommended
talk:
– Is this talk related to your interest? (yes/no question)
– How interesting this talk to you? (in 5-point scale)
– If the talk is related to your interests, how novel is this
talk to you? (in 5-step scale)
9/26/2010

Experimental Results (Cont’d)

• Compared six models:
– KO, KnT (with n = 1, 2,5; best n = 1), and KCT
• using only CoMeT data
• using both, CoMeT and CiteULike

• Measures:
– Relevance: precision by yes/no answers
– Interest: nDCG by 5-point scale
– Novelty: averaged the novelty ratings (Non-relevant =
zero novelty)
9/26/2010

Precision results for different
number of recommendations
Precision 1 2 3 4 5 6 7 8 9 10
KO 0.83 0.67 0.72 0.63 0.6 0.56 0.57 0.5 0.51 0.51
Only KnT
CoMeT 0.5 0.5 0.58 0.59 0.57 0.58 0.57 0.58 0.6 0.57
Data n=1
KCT 0.5 0.33 0.39 0.46 0.47 0.53 0.52 0.5 0.5 0.53
KO 0.83 0.83 0.67 0.75 0.73 0.69 0.64 0.63 0.56 0.57
CoMeT + KnT
CiteULike 0.63 0.69 0.71 0.72 0.73 0.73 0.71 0.7 0.68 0.67
Data n=1
KCT 0.38 0.44 0.42 0.47 0.48 0.52 0.5 0.49 0.53 0.55

9/26/2010

Precision results for different
number of recommendations (Cont’d)
• Adding tag using KnT→ better cumulative precision
for top 10 recommendations
• Adding CoMeT data in both KnT and KO → higher
precision
• KnTwith both CoMeT and CUL data → best
cumulative precision
• KCT model → decrease in precision
– High dimensionality of vector space model→ increased
distance of documents and user profiles → decreased
variance between similarities of user profile to different
talks

9/26/2010

nDCG Results for different number of
recommendations
nDCG 1 2 3 4 5 6 7 8 9 10
KO 0.9 0.88 0.89 0.93 0.92 0.94 0.95 0.95 0.95 0.96
Only KnT
CoMeT 0.9 0.85 0.82 0.83 0.87 0.88 0.89 0.9 0.91 0.93
Data n=1
KCT 0.84 0.88 0.89 0.9 0.9 0.91 0.92 0.92 0.94 0.95
KO 0.84 0.91 0.9 0.92 0.93 0.94 0.95 0.96 0.96 0.96
CoMeT + KnT
CiteULike 0.9 0.9 0.89 0.88 0.9 0.92 0.92 0.94 0.94 0.95
Data n=1
KCT 0.77 0.85 0.84 0.81 0.83 0.84 0.86 0.88 0.91 0.92

9/26/2010

nDCG Results for different number of
recommendations (Cont’d)

• KCT and KnTmodels: using both CiteULike and
CoMeT data → increased user cumulative
interest

• Best results: tag-less KO model both with and
without CiteULike data

9/26/2010

Novelty Results for different number of
recommendations
Novelty 1 2 3 4 5 6 7 8 9 10
KO 1.75 1.69 1.67 1.72 1.7 1.65 1.66 1.55 1.49 1.44
Only KnT
CoMeT 1.88 1.75 1.67 1.88 1.88 1.88 2 2.03 1.99 1.93
Data n=1
KCT 2 1.5 1.54 1.56 1.55 1.6 1.63 1.58 1.5 1.5
KO 1.88 1.44 1.33 1.5 1.5 1.52 1.61 1.47 1.44 1.36
CoMeT + KnT
CiteULike 1.75 2.19 1.79 2.06 2.2 2.08 2.02 2.19 2.06 1.96
Data n=1
KCT 1.38 1.31 1.38 1.47 1.58 1.6 1.52 1.47 1.61 1.64

9/26/2010

Novelty Results for different number of
recommendations (Cont’d)
• Adding tags using KnT fusion model → largest positive impact

• adding different sources of information → improve the
novelty of recommendations
– Tags are provided by users → include a broader range of vocabulary
– Each user tags: describe a document from her point of view
(different from the terms included in the document)

• Adding CUL data in KO model → decreased novelty
– Distinctive natures of CoMeT and CiteULike systems
• CiteULike: adding, reviewing and rating related papers to their research
field
• CoMeT: information about talks happening within a specific time given on a
particular date users bookmark a more novel, less relevant talk

9/26/2010

Conclusion

• Relevance: a fit to user research work
• Interest: an overall attraction of an item
• Users interested in talks on more general topics
– little in common with their research interests
• Increased focus of relevance encapsulated in
tags → The decrease of system ability to
recommend interestingtalks with the addition of
tags

9/26/2010

Conclusion (Cont’d)
• Including another reliable user profile → increase precision of
recommendations;
– Considering the way to augment the additional profile
• Using CiteULike data for all models
– Increased Relevancyof every recommended documents
– Various results of interestingness
• Adding tags
– Increased noveltyof recommendations (both using CoMeT and CUL data)
– increased relatednessin larger number of recommendations
• Injection of keywords from another source of data: more reliable
than including tags for relevancy
• Including tags from various sources of information: more reliable
for interestingness or novelty

9/26/2010

Thank you!

9/26/2010

Social Systems for Smaller Communities

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to Social Systems for Smaller Communities

Similar to Social Systems for Smaller Communities (20)

More from Peter Brusilovsky

More from Peter Brusilovsky (20)

Recently uploaded

Recently uploaded (20)

Social Systems for Smaller Communities