Crowdsourcing via Tensor Augmentation and Completion

Arizona State University
Crowdsourcing via Tensor Augmentation
and Completion (TAC)
Presenter: Yao Zhou joint work with: Dr. Jingrui He
- 1 -
Arizona State University Arizona State University

Arizona State University- 2 -
Roadmap
 Background
 Related work
 Crowdsourcing based on TAC
 Experimental results
 Conclusion

Crowdsourcing in machine learning
 Training a supervised machine learning model needs
training labels
 Many crowdsourcing platforms provide services to collect
labels information.

An example of crowdsourcing
Lynx (wildcat)
Tabby (domestic cat)

Key problem of crowdsourcing
 Cons:
 Low quality: Collected labels from the crowd (non-expert) are noisy.
 Missing labels: Some workers are not willing to label all of the items.
How to infer the true labels from a large number of labels
collected from crowd?
 Pros:
 Low cost: Collecting large amounts of labels is economic.
Noisy
labels
Missing
labels

Some related work
MV
 Majority Voting, a simple baseline.
DS-EM [Dawid and Skene, 1979]
 Infer worker’s ability matrix and true labels.
 Two-coin model for a binary labelling task.
GLAD [Whitehill et al., 2009].
 Infer the worker’s ability, item difficulty and item true labels simultaneously.
DS-MF [Liu et al., 2012].
 Employ variational Bayesian inference using meanfield algorithm.
MMCE [Zhou et al., 2012].
 Employ the minimax entropy principle to infer worker ability, item difficulty and
true labels at the same time.
Structural
information of labels
is not utilized !!

Roadmap
 Background
 Related work
 Conclusion

Tensor augmentation
 Notation:
 Re-organize labels of crowds as a three-way
tensor:
 Based on worker’s labelling decision,
generate an index set:
• Workers: i = 1,2, …, Nw
• Items: j = 1,2, …, Ni
• Classes: k = 1, …, Nc
Tensor T
# of items Ni
#ofworkersNw
Ground truth layer
+1
 The ground truth layer:
 Extra tensor slice of size 𝑁𝑖 × 𝑁𝑐.
 Augmented on tensor along the
worker dimension.

Tensor augmentation and completion (TAC)
 Main principle of TAC:
 Rank minimization
NP-hard
Tightest
convex
envelope
 Trace norm minimization
 Goal of TAC:
 Complete the augmented tensor

 Definition of trace norm for an n-way tensor [Liu et.al 2012]:
Here, represents for the unfold of a tensor X. The reverse operation is fold.𝑋 𝑙
Tensor: 𝑋 ∈ 𝑅3×2×2
Unfolded matrices:
𝑋(1) ∈ 𝑅3×4
𝑋(2) ∈ 𝑅2×6
𝑋 3 ∈ 𝑅2×6
Reference: Ji Liu et.al. Tensor completion for estimating missing values in visual data. TPAMI 2012

Relaxed objective of TAC with regularization:
Regularization
term
Index of the
ground truth layer
Intermediate
relaxed matrices
Solution:
 Block Coordinate Descend (BCD)
 Four blocks of variables:

Updating 𝑴𝒍
Sub-problem:
Closed form solution, proved by [Cai et.al. 2009]:
Here, and τ =
Reference: Jian-Feng Cai, et.al. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2010.
Singular Value
Thresholding

Updating X
Two formulations:
 Prior guided ground truth inference (PG-TAC)
Prior Statistics
 Relaxed simplex ground truth inference (RS-TAC)
Slack variable

Prior guided ground truth inference (PG-TAC)
 Solution:
Elements of tensor X can be divided into three sets {𝐶1, 𝐶2, 𝐶3}
Elements of set 𝑪 𝟏:
Elements of set 𝑪 𝟐:
Elements of set 𝑪 𝟑:
Updating X
Tensor T
Ground truth layer
Elements of
Set C3
Elements of
Set C1
Elements of
Set C2

Elements of set 𝑪 𝟑:
Updating X
Prior guided ground truth inference (RS-TAC)
 Solution:
Elements of tensor X can be divided into three sets {𝐶1, 𝐶2, 𝐶3}
Elements of set 𝑪 𝟏:
Elements of set 𝑪 𝟐:
Tensor T
Ground truth layer
Elements of
Set C3
Elements of
Set C1
Elements of
Set C2
Different
from PG-TAC

Roadmap
 Background
 Related work
 Conclusion

Experimental Results
Synthetic Data Set:
 Four configurations:
 Notations:
• # of Workers: Nw
• # of Items: Ni
• # of Classes: Nc
• Probability of not giving
labels q
Lower
is
better
Lower
is
better
 Initial configuration:
• Nw = 50, Ni = 400
• Nc = 4, q = 0.7

Real-world Data Set:
References:
Dengyong Zhou, et.al. Learning from the wisdom of crowds by minimax entropy. NIPS, 2012.
Dengyong Zhou, et.al. Regularized minimax conditional entropy for crowdsourcing. CoRR, 2015.
Hu Han, et.al. Demographic estimation from face images: Human vs. machine performance. TPAMI, 2015.
Rion Snow, et.al. Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. EMNLP, 2008.

Real-world Data Set results:
References:
Qiang Liu, et al. Variational inference for crowdsourcing. NIPS, 2012.
Dengyong Zhou, et.al. Learning from the wisdom of crowds by minimax entropy. NIPS, 2012.
A. P. Dawid, et al. Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 1979.
Jacob Whitehill et al. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. NIPS , 2009.

Conclusion
 Two novel methods PG-TAC and RS-TAC:
 Augment the data tensor with a ground truth layer.
 Utilize the structural information of crowd labels.
 Infer the true labels of items in binary and multi-class settings.
 Experimental results:
 Six real data sets.
 Outperform state-of-the-art methods.

Thank you !
&
Questions?

Crowdsourcing via Tensor Augmentation and Completion

Recommended

Recommended

More Related Content

Similar to Crowdsourcing via Tensor Augmentation and Completion

Similar to Crowdsourcing via Tensor Augmentation and Completion (20)

Recently uploaded

Recently uploaded (20)

Crowdsourcing via Tensor Augmentation and Completion