Crowd sourcing for tempo estimation

Improving perceptual tempo estimation
with crowd-sourced annotations
Mark Levy, 26 October 2011

Tempo Estimation
Terminology:
 tempo = beats per minute = bpm

Tempo Estimation
Use crowd-sourcing:
 quantify influence of metrical ambiguity

on tempo perception
 improve evaluation

 improve algorithms

Perceived Tempo
Metrical ambiguity:
 listeners don’t agree about bpm

 typically in two camps

 perceived values differ by factor of 2 or 3

McKinney and Moelants:
 24-40 subjects

 released experimental data

Perceived Tempo
Metrical ambiguity:
listeners

listeners

bpm bpm

McKinney and Moelants, 2004

Machine-Estimated Tempo
Also affected by metrical ambiguity:
 makes estimation difficult

 natural to see multiple bpm values

 estimated values often out by factor of 2 or 3

(“octave error”)

Crowd Sourcing
Web-based questionnaire:
 capture label choices

 capture bpm from mean tapping interval

 capture comparative judgements

Crowd Sourcing
Music:
 over 4000 songs

 30-second clips

• rock, country, pop, soul, funk and rnb, jazz,
latin, reggae, disco, rap, punk, electronic,
trance, industrial, house, folk, ...
• recent releases back to 60s

Response
First week (reported/released):
 4k tracks annotated by 2k listeners

 20k labels and bpm estimates

To date:
 6k tracks annotated by 27k listeners

 200k labels and bpm estimates

Analysis: ambiguity
When people tap to a song at different bpm
 do they really disagree about whether it’s

slow or fast?

Investigation:
 inspect labels from people who tap differently

 quantify disagreement for ambiguous songs

Analysis: ambiguity
Subset of slow/fast songs:
 labelled by at least five listeners

 majority label “slow” or “fast”

Analysis: ambiguity
bpm vs speed label

all estimates for slow/fast songs

Analysis: ambiguity
bpm vs speed label

people can tap slowly to fast songs

all estimates for slow/fast songs

Analysis: ambiguity
Labels for fast songs from slow-tappers

Analysis: ambiguity
Quantify disagreement over labels:
 model conflict, extremity of tempo

 conflict coefficient

min(Ls , L f ) Ls Lf
C
max(Ls , L f ) L

Ls, Lf, L: number of slow, fast, all labels for a song

Analysis: ambiguity
Distribution of conflict coefficient C

C > 0 means slow and fast

all songs with at least five labels

Analysis: ambiguity
Subset of metrically ambiguous songs:
 at least 30% of listeners tap at half/twice the

majority estimate

Compared to the rest:
 no significant difference in C

Evaluation metrics
MIREX:
 capture metrical ambiguity

 replicate human disagreement

Ambiguity considered unhelpful:
 automatic playlisting

 DJ tools, production tools

 jogging

Evaluation metrics
Application-oriented :
 compare with majority* human estimate
(*median in most popular bin)
 categorise machine estimates
 same as humans
 twice as fast
 twice as slow
 three times as fast
 and so on
 unrelated to humans

Analysis: evaluation
Sources:
 BPM List (DJ kit, human-moderated)

Donny Brusca, 7th edition, 2011
 EchoNest/MSD (closed-source algorithm)
maybe Jehan et al,?
 VAMP (open-source algorithm)
Davies and Landone, 2007-

Analysis: machine vs human
80%

70%

60%

50%
BPM List
40%
VAMP
30% EchoNest

20%

10%

0%
x2 same /2 unrelated other

Analysis: controlled test
Controlled comparison:
 exploit experience from website A/B testing

 use this to improve algorithm iteratively

Result is independent of any quality metric

When visitor arrives at the page:
 choose a source S at random

 choose a bpm value at random

 choose two songs given that value by S

 display them together

Then ask which sounds faster!

Null Hypothesis:
 there will be presentation effects

 listeners will attend to subtle differences

but
 these effects are independent of the source

of bpm estimates
 if the quality of the sources is the same

100%
90%
80%
70%
60%
50% different
40% same

30%
20%
10%
0%
BPM List VAMP EchoNest

Analysis: improving estimates
Adjust bpm based on class:
 imagine an accurate slow/fast classifier

Hockmann and Fujinaga, 2010
 adjust as follows:
bpm:= bpm/2 if slow and bpm > 100
bpm:= bpm*2 if fast and bpm < 100
otherwise don’t adjust
 simulation: accept majority human label

Analysis: adjusted vs human
80%

70%

60%

50%
BPM List
40%
VAMP
30% EchoNest

20%

10%

0%
x2 same /2 unrelated other

Conclusions
Crowd sourcing:
 gather thousands of data points in a few

days, half a million over time
 humans agree over slow/fast labels, even

when they tap at different bpm
Improving machine estimates:
 use controlled testing

 exploit a slow/fast classifier

Thanks!
mark@last.fm @gamboviol

http://mir-in-action.blogspot.com
http://playground.last.fm/demo/speedo
http://users.last.fm/~mark/speedo.tgz

We are looking for interns/research fellows!

Crowd sourcing for tempo estimation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to Crowd sourcing for tempo estimation

Similar to Crowd sourcing for tempo estimation (8)

Recently uploaded

Recently uploaded (20)

Crowd sourcing for tempo estimation