34. User-Based的用户相似度算法
n
∑R
rr Rvi
余弦相似性 rr ui
uv
sim(u, v) = cos(u , v) = uu ur = i =1
r
u×v n n
∑R ∑R 2 2
ui vi
i =1 i =1
相关相似性(Pearson相关系数 )
∑ ( Rui − Ru )( Rui − Rv )
i∈I uv
sim(u , v) =
∑ ∑
( Rui − Ru ) 2 ( Rvi − Rv ) 2
i∈Iuv i∈Iuv
修正的余弦相似性
∑ ( Rui − Ri )( Rui − Ri )
i∈I uv
sim(u , v) =
∑ ∑
( Rui − Ri ) 2 ( Rvi − Ri ) 2
i∈I uv i∈Iuv
35. User-Based的相似性算法-余弦相似性
Similarity b t
Si il it between items i & j is computed
it i td
by isolating the users who have rated them
and then applying a similarity computation
technique.
Cosine-based
Cosine based Similarity – items are vectors
in the m dimensional user space
(difference in rating scale between users is
not taken into account).
36. d的相似性算法 相关相似性
User-Based的相似性算法-相关相似性
U B
Correlation-based Similarity - using the
Pearson-r correlation (used only in cases
P l ti ( d li
where the uses rated both item I & item j).
R(u,i) = rating of user u on item i.
R(i) = average rating of the i-th item.
37. User-Based的相似性算法-修正的余弦相似性
Adjusted Cosine Similarity – each pair in the
co-rated set corresponds to a different user.
td t dt diff t
(takes care of difference in rating scale).
R(u,i) = rating of user u on item i.
R(u) = average of the u-th user.
45. 参考资料
Wiki:
http://en.wikipedia.org/wiki/Collaborative_filtering
http://en.wikipedia.org/wiki/Web_analytics
http://en.wikipedia.org/wiki/Recommendation_system
图书
Programming Collective Intelligence: Building Smart Web 2.0
Applications
Web Analytics: An Hour a Day
Data Mining:Concepts and Techniques
Mining the Web:Transforming Customer Data into Cutomer Value
Data Mining Techniques: For Marketing, Sales, and Customer
Relationship Management
45
46. 参考资料
开源项目
Open Source Collaborative Filtering Written in Java
Carrot2 Clustering Engine
Weka 3: Data Mining Software in Java
Taste
46
47. 参考资料
Blog
http://glinden.blogspot.com/
http://guwendong.cn/
http://www.weigend.com/
http://www.chinawebanalytics.cn/
数学之美系列
Mining Social Data for Fun and Insight
47