This presentation discusses novel crowdsourcing methods based on tensor augmentation and completion (TAC). It introduces two TAC-based methods, PG-TAC and RS-TAC, that augment crowdsourced label data with a ground truth layer to infer true labels. Experimental results on six real datasets show PG-TAC and RS-TAC outperform state-of-the-art crowdsourcing methods by utilizing structural information in crowd labels. The presentation concludes the two novel TAC methods infer true labels from crowdsourced data in binary and multi-class settings more accurately than previous approaches.
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Crowdsourcing via Tensor Augmentation and Completion
1. Arizona State University
Crowdsourcing via Tensor Augmentation
and Completion (TAC)
Presenter: Yao Zhou joint work with: Dr. Jingrui He
- 1 -
Arizona State University Arizona State University
2. Arizona State University- 2 -
Roadmap
Background
Related work
Crowdsourcing based on TAC
Experimental results
Conclusion
3. Arizona State University- 3 -
Crowdsourcing in machine learning
Training a supervised machine learning model needs
training labels
Many crowdsourcing platforms provide services to collect
labels information.
5. Arizona State University- 5 -
Key problem of crowdsourcing
Cons:
Low quality: Collected labels from the crowd (non-expert) are noisy.
Missing labels: Some workers are not willing to label all of the items.
How to infer the true labels from a large number of labels
collected from crowd?
Pros:
Low cost: Collecting large amounts of labels is economic.
Noisy
labels
Missing
labels
6. Arizona State University- 6 -
Some related work
MV
Majority Voting, a simple baseline.
DS-EM [Dawid and Skene, 1979]
Infer worker’s ability matrix and true labels.
Two-coin model for a binary labelling task.
GLAD [Whitehill et al., 2009].
Infer the worker’s ability, item difficulty and item true labels simultaneously.
DS-MF [Liu et al., 2012].
Employ variational Bayesian inference using meanfield algorithm.
MMCE [Zhou et al., 2012].
Employ the minimax entropy principle to infer worker ability, item difficulty and
true labels at the same time.
Structural
information of labels
is not utilized !!
7. Arizona State University- 7 -
Roadmap
Background
Related work
Crowdsourcing based on TAC
Experimental results
Conclusion
8. Arizona State University- 8 -
Tensor augmentation
Notation:
Re-organize labels of crowds as a three-way
tensor:
Based on worker’s labelling decision,
generate an index set:
• Workers: i = 1,2, …, Nw
• Items: j = 1,2, …, Ni
• Classes: k = 1, …, Nc
Tensor T
# of items Ni
#ofworkersNw
Ground truth layer
+1
The ground truth layer:
Extra tensor slice of size 𝑁𝑖 × 𝑁𝑐.
Augmented on tensor along the
worker dimension.
9. Arizona State University- 9 -
Tensor augmentation and completion (TAC)
Main principle of TAC:
Rank minimization
NP-hard
Tightest
convex
envelope
Trace norm minimization
Goal of TAC:
Complete the augmented tensor
10. Arizona State University- 10 -
Tensor augmentation and completion (TAC)
Definition of trace norm for an n-way tensor [Liu et.al 2012]:
Here, represents for the unfold of a tensor X. The reverse operation is fold.𝑋 𝑙
Tensor: 𝑋 ∈ 𝑅3×2×2
Unfolded matrices:
𝑋(1) ∈ 𝑅3×4
𝑋(2) ∈ 𝑅2×6
𝑋 3 ∈ 𝑅2×6
Reference: Ji Liu et.al. Tensor completion for estimating missing values in visual data. TPAMI 2012
11. Arizona State University- 11 -
Tensor augmentation and completion (TAC)
Relaxed objective of TAC with regularization:
Regularization
term
Index of the
ground truth layer
Intermediate
relaxed matrices
Solution:
Block Coordinate Descend (BCD)
Four blocks of variables:
12. Arizona State University- 12 -
Updating 𝑴𝒍
Sub-problem:
Closed form solution, proved by [Cai et.al. 2009]:
Here, and τ =
Reference: Jian-Feng Cai, et.al. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010.
Singular Value
Thresholding
13. Arizona State University- 13 -
Updating X
Two formulations:
Prior guided ground truth inference (PG-TAC)
Prior Statistics
Relaxed simplex ground truth inference (RS-TAC)
Slack variable
14. Arizona State University- 14 -
Prior guided ground truth inference (PG-TAC)
Solution:
Elements of tensor X can be divided into three sets {𝐶1, 𝐶2, 𝐶3}
Elements of set 𝑪 𝟏:
Elements of set 𝑪 𝟐:
Elements of set 𝑪 𝟑:
Updating X
Tensor T
Ground truth layer
Elements of
Set C3
Elements of
Set C1
Elements of
Set C2
15. Arizona State University- 15 -
Elements of set 𝑪 𝟑:
Updating X
Prior guided ground truth inference (RS-TAC)
Solution:
Elements of tensor X can be divided into three sets {𝐶1, 𝐶2, 𝐶3}
Elements of set 𝑪 𝟏:
Elements of set 𝑪 𝟐:
Tensor T
Ground truth layer
Elements of
Set C3
Elements of
Set C1
Elements of
Set C2
Different
from PG-TAC
16. Arizona State University- 16 -
Roadmap
Background
Related work
Crowdsourcing based on TAC
Experimental results
Conclusion
17. Arizona State University- 17 -
Experimental Results
Synthetic Data Set:
Four configurations:
Notations:
• # of Workers: Nw
• # of Items: Ni
• # of Classes: Nc
• Probability of not giving
labels q
Lower
is
better
Lower
is
better
Initial configuration:
• Nw = 50, Ni = 400
• Nc = 4, q = 0.7
18. Arizona State University- 18 -
Experimental Results
Real-world Data Set:
References:
Dengyong Zhou, et.al. Learning from the wisdom of crowds by minimax entropy. NIPS, 2012.
Dengyong Zhou, et.al. Regularized minimax conditional entropy for crowdsourcing. CoRR, 2015.
Hu Han, et.al. Demographic estimation from face images: Human vs. machine performance. TPAMI, 2015.
Rion Snow, et.al. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. EMNLP, 2008.
19. Arizona State University- 19 -
Experimental Results
Real-world Data Set results:
References:
Qiang Liu, et al. Variational inference for crowdsourcing. NIPS, 2012.
Dengyong Zhou, et.al. Learning from the wisdom of crowds by minimax entropy. NIPS, 2012.
A. P. Dawid, et al. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 1979.
Jacob Whitehill et al. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. NIPS , 2009.
20. Arizona State University- 20 -
Conclusion
Two novel methods PG-TAC and RS-TAC:
Augment the data tensor with a ground truth layer.
Utilize the structural information of crowd labels.
Infer the true labels of items in binary and multi-class settings.
Experimental results:
Six real data sets.
Outperform state-of-the-art methods.