SlideShare a Scribd company logo
1 of 43
TEACHING COMPUTERS TO LISTEN TO MUSIC

Eric Battenberg

ebattenberg@gracenote.com

http://ericbattenberg.com
Gracenote, Inc.
Previously:
UC Berkeley, Dept. of EECS
Parallel Computing Laboratory
CNMAT (Center for New Music and Audio Technologies)
Business Verticals

Rich, Diverse
Multimedia Metadata

Cloud Music
Services,
Mobile Apps and
Devices

Smart TVs, Second
Screen Apps,
Targeted Ads

Connected
Infotainment
Systems on the
Road

2
SOME OF GRACENOTE’S PRODUCTS


Original product: CDDB (1998)


Recognize CDs and retrieve associated metadata



Scan & Match to precisely identify digital music quickly in
large catalogs



ACR (Automatic Content Recognition)





Audio/Video Fingerprinting, Enabling second screen experiences
Real-time audio classification systems for mobile devices.

Contextually aware cross-modal personalization




Taste profiling for Music, TV, Movies, Social networks

Music Mood descriptors that combine machine learning and
editorial training data.
3
GRACENOTE’S CUSTOMERS
Music

Video

Auto

4
5
MACHINE LISTENING


Speech processing






Speech processing makes up the vast majority of funded
machine listening research.
Just as there’s more to Computer Vision than OCR, there’s more
to machine listening than speech recognition!

Audio content analysis



Audio classification (music, speech, noise, laughter, cheering)




Audio fingerprinting (e.g. Gracenote, Shazam)
Audio event detection (new song, channel change, hotword)

Content-based Music Information Retrieval (MIR)


Today’s topic
6
GETTING COMPUTERS TO
“LISTEN” TO MUSIC


Not trying to get computers to “listen” for enjoyment.



More accurate: Analyzing music with computers.



What kind of information to get out of the analysis?


What instruments are playing?



What is the mood?



How fast/slow is it (tempo)?



What does the singer sound like?



How can I play this song on my instrument?
Ben Harper

James Brown

7
CONTENT-BASED MUSIC INFORMATION
RETRIEVAL


Many tasks:


Genre, mood classification, auto-tagging



Beat tracking, tempo detection



Music similarity, playlisting, recommendation


Heavily aided by collaborative filtering



Automatic music transcription



Source separation (voice, drums, bass…)



Music segmentation (verse, chorus)

8
TALK SUMMARY


Introduction to Music Information Retrieval (MIR)


Some common techniques




Exciting new research directions




Auto-tagging, onset detection

Deep learning!

Example: Live Drum Understanding


Drum detection/transcription

9
TALK SUMMARY


Introduction to Music Information Retrieval (MIR)


Some common techniques




Exciting new research directions




Auto-tagging, onset detection

Deep learning!

Example: Live Drum Understanding


Drum detection/transcription

10
QUICK LESSON: THE SPECTROGRAM
The spectrogram: Very common feature used in audio analysis.



Time-frequency representation of audio.



Take FFT of adjacent frames of audio samples, put them in a
matrix.



Each column shows frequency content at a particular time.

Frequency



11

Time
MUSIC AUTO-TAGGING:
TYPICAL APPROACHES


Typical approach:





Extract a bunch of hand-designed features describing small
windows of the signal (e.g., spectral centroid, kurtosis,
harmonicity, percussiveness, MFCCs, 100’s more…).
Train a GMM or SVM to predict genre/mood/tags.

Pros:





Works fairly well, was state of the art for a while.
Well understood models, implementations widely available.

Cons:


Bag-of-frames style approach lacks ability to describe rhythm and
temporal dynamics.



Getting further improvements requires hand designing more
features.

12
LEARNING FEATURES:
NEURAL NETWORKS
Each layer computes a non-linear
transformation of the previous layer.



Train to minimize output error.



Each hidden layer can be thought of as
a set of features.



Train using backpropagation.



Iterative steps:


Compute output error



Backprop. error signal



Compute gradients




Compute activations

0.0

Backpropagate



1.0

Feedfordward



Sigmoid non-linearity

Update all weights.

Resurgence of neural networks:


More compute (GPUs, Google, etc.)



More data



A few new tricks…

13
DEEP NEURAL NETWORKS
Deep Neural Networks




Problem: Vanishing error signal.




Weights of lower layers do not
change much.

Solutions:




Train for a really long time. 
Pre-train each hidden layer as an
autoencoder. [Hinton, 2006]
Rectified Linear Units [Krizhevsky,
2012]

Sigmoid non-linearity

Backpropagate



Millions to billions of parameters
Many layers of “features”
Achieving state of the art
performance in vision and speech
tasks.
Feedfordward



Rectifier non-linearity

14
AUTOENCODERS AND
UNSUPERVISED FEATURE LEARNING




Many ways to learn features in an
unsupervised way:
Autoencoders – train a network to
reconstruct the input


Reconstructed
Hiddens Data
(Features)

Denoising Autoencoders [Vincent, 2008]



Data

Restricted Boltzmann Machine (RBM)
[Hinton, 2006]



Autoencoder:
Train to reconstruct input

Sparse Autoencoders



Clustering – K-Means, mixture models,
etc.



Sparse Coding – learn overcomplete
dictionary of features with sparsity
constraint

15
MUSIC AUTO-TAGGING:
NEWER APPROACHES


Newer approaches to feature extraction:








Learn spectral features using Restricted Boltzmann
Machines (RBMs) and Deep Neural Networks (DNN)
[Hamel, 2010] – good genre performance.
Learn sparse features using Predictive Sparse
Decomposition (PSD) [Henaff, 2011] – good genre
performance
Learn beat-synchronous rhythm and timbre features with
RBMs and DNNs [Schmidt, 2013] – improved mood
performance

Pros:





Some learned
frequency features

Hand-designing individual features is not required.
Computers can learn complex high-order features that
humans cannot hand code.

[Henaff, 2011]

Missing pieces:


More work on incorporating time: context, rhythm, and longterm structure into feature learning

16
RECURRENT NEURAL NETWORKS


Non-linear sequence model



Hidden units have connections to previous time step



Unlike HMMs, can model long-term dependencies using
distributed hidden state.



Recent developments (+ more compute) have made them much
more feasible to train.
Standard Recurrent Neural Network

Output units

Distributed
hidden state

Input units

17

from [Sutskever, 2013]
APPLYING RNNS TO ONSET DETECTION


Onset Detection:


Audio Signal

Detect the beginning of notes



Important for music transcription, beat
tracking, etc.



Onset detection can be hard for certain
instruments with ambiguous attacks or
in the presence of background
interference.

Onset Envelope

18
from (Bello, 2005)
ONSET DETECTION: STATE-OF-THE-ART


Using Recurrent Neural Networks (RNN) [Eyben, 2010],
[Böck, 2012]
Spectrogram + Δ’s
@ 3 time-scales

3 layer RNN

Onset Detection
Function

3 Layer RNN
3 Layer RNN



RNN output trained to predict onset locations.



80-90% accurate (state-of-the art), compared to 60-80%



Can improve with more labeled training data, or possibly more
unsupervised training.

19
OTHER EXAMPLES OF RNNS IN MUSIC


Examples:



Classical music generation and transcription [BoulangerLewandowski, 2012]





Polyphonic piano note transcription [Böck, 2012]

NMF-based source separation with predictive constraints from
RNN. [ICASSP 2014 “Deep Learning in Music”].

RNNs are a promising way to model long-term contextual and
temporal dependencies present in music.

20
TALK SUMMARY


Introduction to Music Information Retrieval (MIR)


Some common techniques




Exciting new research directions




Auto-tagging, onset detection

Deep learning!

Example: Live Drum Understanding


Drum detection/transcription

21
LIVE DRUM TRANSCRIPTION


Real-Time/Live operation



Useful with any percussion setup.





Before a performance, we can quickly train the system for a particular
percussion setup.
Or train a more general model for common drum types.

Amplitude (dynamics) information.


Very important for musical understanding

22
DRUM DETECTION SYSTEM
training data
(drum-wise audio)

Drum Modeling

Onset
Detection

Gamma
Mixture Model
cluster parameters
(drum templates)

performance
(raw audio)

Spectrogram
Slice
Extraction
training data
(drum-wisedata
training audio)
(drum-wise audio)

performance
(raw audio)
performance
(raw audio)

Feature Extraction
Onset
Training
Onset
Detection
Detection
Performance

Non-negative
Vector
Decomposition

drum
activations

Source Separation
Gamma
Gamma
Mixture Model
Mixture Model
cluster parameters
(drum parameters
cluster templates)
(drum templates)

Non-negative

23
DRUM DETECTION SYSTEM
training data
(drum-wise audio)

Drum Modeling

Onset
Detection

Gamma
Mixture Model
cluster parameters
(drum templates)

performance
(raw audio)

Spectrogram
Slice
Extraction
training data
(drum-wisedata
training audio)
(drum-wise audio)

performance
(raw audio)
performance
(raw audio)

Feature Extraction
Onset
Training
Onset
Detection
Detection
Performance

Non-negative
Vector
Decomposition

drum
activations

Source Separation
Gamma
Gamma
Mixture Model
Mixture Model
cluster parameters
(drum parameters
cluster templates)
(drum templates)

Non-negative

24
SPECTROGRAM SLICES


Extracted at onsets.



Each slice contains 100ms of audio

33ms

67ms

80 bands

Detected onset

25
Head Slice
DRUM DETECTION SYSTEM
training data
(drum-wise audio)

Drum Modeling

Onset
Detection

Gamma
Mixture Model
cluster parameters
(drum templates)

performance
(raw audio)

Spectrogram
Slice
Extraction
training data
(drum-wisedata
training audio)
(drum-wise audio)

performance
(raw audio)
performance
(raw audio)

Feature Extraction
Onset
Training
Onset
Detection
Detection
Performance

Non-negative
Vector
Decomposition

drum
activations

Source Separation
Gamma
Gamma
Mixture Model
Mixture Model
cluster parameters
(drum parameters
cluster templates)
(drum templates)

Non-negative

26
MODELING DRUM SOUNDS WITH
GAMMA MIXTURE MODELS


Instead of taking an “average” of all training slices
for a single drum…



Model each drum using means from Gamma
Mixture Model.


Captures range of sounds produced by each drum.



Cheaper to train than GMM (no covariance matrix)



More stable than GMM (no covariance matrix)



Cluster according to human auditory perception.




Not Euclidean distance

Keep audio spectrogram in linear domain (not logdomain)


Important for linear source separation
27
DRUM DETECTION SYSTEM
training data
(drum-wise audio)

Drum Modeling

Onset
Detection

Gamma
Mixture Model
cluster parameters
(drum templates)

performance
(raw audio)

Spectrogram
Slice
Extraction
training data
(drum-wisedata
training audio)
(drum-wise audio)

performance
(raw audio)
performance
(raw audio)

Feature Extraction
Onset
Training
Onset
Detection
Detection
Performance

Non-negative
Vector
Decomposition

drum
activations

Source Separation
Gamma
Gamma
Mixture Model
Mixture Model
cluster parameters
(drum parameters
cluster templates)
(drum templates)

Non-negative

28
DECOMPOSING ONSETS ONTO TEMPLATES


Non-negative Vector Decomposition (NVD)


A simplification of Non-negative Matrix Factorization (NMF)



W matrix contains drum templates in its columns.



Adding a sparsity penalty (L1) on h improves NVD.
Solve:

Input

x

Drum Templates

=

W
*
Bass

Snare

Hi-Hat
(closed)

Hi-Hat
(open)

Ride

29

h
Output
EVALUATION


Test Data:



Recorded to stereo using multiple
microphones.




8 different drums/cymbals

50-100 training hits per drum.

Vary maximum number of templates per
drum

30
DETECTION RESULTS


Varying maximum templates per drum.



More templates = better accuracy
Detection Accuracy
(F-score)

Amplitude Accuracy
(Cosine Similarity)

31

More Templates Per Drum

More Templates Per Drum
DRUM DETECTION: MAIN POINTS


Gamma Mixture Model



Cheaper to train than GMM



More stable than GMM




For learning spectral drum templates.

Keep audio spectrogram in linear domain (not log-domain)

Non-negative Vector Decomposition (NVD)


For computing template activations from drum onsets.



Learning multiple templates per drum improves source
separation and detection.

32
AUDIO EXAMPLES


100 BPM, rock, syncopated snare drum, fills/flams.



Note the messy fills in the single template version.
Original Performance

Multiple Templates per Drum

Single Template per Drum

33
AUDIO EXAMPLES


181 BPM, fast rock, open hi-hat



Note the extra hi-hat notes in the single template version.
Original Performance

Multiple Templates per Drum

Single Template per Drum
34
AUDIO EXAMPLES


94 BPM, snare drum march, accented notes



Note the extra bass drum notes and many extra cymbals in the single template version.

Original Performance

Multiple Templates per Drum

Single Template per Drum

35
SUMMARY


Content-Based Music Information Retrieval





Mood, genre, onsets, beats, transcription, recommendation, and
much, much more!
Exciting directions include deep/feature learning and RNNs.

Drum Detection


Gamma mixture model template learning.



Learning multiple templates = better source separation
performance

36
TEACHING COMPUTERS TO LISTEN TO MUSIC

Eric Battenberg

ebattenberg@gracenote.com

http://ericbattenberg.com
Gracenote, Inc.
Previously:
UC Berkeley, Dept. of EECS
Parallel Computing Laboratory
CNMAT (Center for New Music and Audio Technologies)
GETTING INVOLVED IN MUSIC INFORMATION
RETRIEVAL


Check out the proceedings of ISMIR (free online):




Participate in MIREX (annual MIR eval):




http://www.music-ir.org/mirex/wiki/MIREX_HOME

Join the Music-IR mailing list:




http://www.ismir.net/

http://listes.ircam.fr/wws/info/music-ir

Join the Music Information Retrieval Google Plus community
(just started it):


https://plus.google.com/communities/109771668656894350107

38
REFERENCES


Genre/Mood Classification:


[1] P. Hamel and D. Eck, “Learning features from music audio with deep belief networks,” Proc. of the
11th International Society for Music Information Retrieval Conference (ISMIR 2010), pp. 339–344,
2010.



[2] M. Henaff, K. Jarrett, K. Kavukcuoglu, and Y. LeCun, “Unsupervised learning of sparse features
for scalable audio classification,” Proceedings of International Symposium on Music Information
Retrieval (ISMIR’11), 2011.



[3] J. Anden and S. Mallat, “Deep Scattering Spectrum,” arXiv.org. 2013.



[4] E. Schmidt and Y. Kim, “Learning rhythm and melody features with deep belief networks,” ISMIR,
2013.

39
REFERENCES


Onset Detection:


[1] J. Bello, L. Daudet, S. A. Abdallah, C. Duxbury, M. Davies, and M. Sandler, “A tutorial on onset
detection in music signals,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, p.
1035, 2005.



[2] A. Klapuri, A. Eronen, and J. Astola, “Analysis of the meter of acoustic musical signals,” IEEE
Transactions on Speech and Audio Processing, vol. 14, no. 1, p. 342, 2006.



[3] J. Bello, C. Duxbury, M. Davies, and M. Sandler, “On the use of phase and energy for musical
onset detection in the complex domain,” IEEE Signal Processing Letters, vol. 11, no. 6, pp. 553–556,
2004.



[4] S. Böck, A. Arzt, F. Krebs, and M. Schedl, “Online realtime onset detection with recurrent neural
networks,” presented at the International Conference on Digital Audio Effects (DAFx-12), 2012.



[5] F. Eyben, S. Böck, B. Schuller, and A. Graves, “Universal onset detection with bidirectional longshort term memory neural networks,” Proc. of ISMIR, 2010.

40
REFERENCES


Neural Networks :




[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional
neural networks,” Advances in neural information processing systems, 2012.

Unsupervised Feature Learning:






[2] G. E. Hinton and S. Osindero, “A fast learning algorithm for deep belief nets,” Neural
Computation, 2006.
[3] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust
features with denoising autoencoders,” pp. 1096–1103, 2008.

Recurrent Neural Networks:


[4] I. Sutskever, “Training Recurrent Neural Networks,” 2013.



[6] D. Eck and J. Schmidhuber, “Finding temporal structure in music: Blues improvisation with LSTM
recurrent networks,” pp. 747–756, 2002.



[7] S. Bock and M. Schedl, “Polyphonic piano note transcription with recurrent neural networks,” pp.
121–124, 2012.



[8] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, “Modeling temporal dependencies in highdimensional sequences: Application to polyphonic music generation and transcription,” arXiv preprint
arXiv:1206.6392, 2012.



[9] G. Taylor and G. E. Hinton, “Two Distributed-State Models For Generating High-Dimensional Time
Series,” Journal of Machine Learning Research, 2011.

41
REFERENCES


Drum Understanding:


[1] E. Battenberg, “Techniques for Machine Understanding of Live Drum Performances,” PhD Thesis,
University of California, Berkeley, 2012.



[2] E. Battenberg and D. Wessel, “Analyzing drum patterns using conditional deep belief networks,”
presented at the International Society for Music Information Retrieval Conference, Porto, Portugal,
2012.



[3] E. Battenberg, V. Huang, and D. Wessel, “Toward live drum separation using probabilistic spectral
clustering based on the Itakura-Saito divergence,” presented at the Audio Engineering Society
Conference: 45th International Conference: Applications of Time-Frequency Processing in Audio,
2012, vol. 3.

42
THANK YOU!

43

More Related Content

What's hot

Real Time Drum Augmentation with Physical Modeling
Real Time Drum Augmentation with Physical ModelingReal Time Drum Augmentation with Physical Modeling
Real Time Drum Augmentation with Physical ModelingBen Eyes
 
Mono and stereo
Mono and stereoMono and stereo
Mono and stereok13086
 
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITIONDeep Learning JP
 
PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm Portfolio
PHOENIX AUDIO TECHNOLOGIES  - A large Audio Signal Algorithm PortfolioPHOENIX AUDIO TECHNOLOGIES  - A large Audio Signal Algorithm Portfolio
PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm PortfolioHTCS LLC
 
Teaching Computers to Listen to Music
Teaching Computers to Listen to MusicTeaching Computers to Listen to Music
Teaching Computers to Listen to MusicEric Battenberg
 
Automatic Music Transcription
Automatic Music TranscriptionAutomatic Music Transcription
Automatic Music TranscriptionKhyati Ganatra
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signalVinodhini
 
Sound for Dummies and DJ's
Sound for Dummies and DJ'sSound for Dummies and DJ's
Sound for Dummies and DJ'sPierre Tougas
 
Audio compression
Audio compressionAudio compression
Audio compressionSahil Garg
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Renyuan Lyu
 
SoundField UPM-1 Review
SoundField UPM-1 ReviewSoundField UPM-1 Review
SoundField UPM-1 ReviewRadikal Ltd.
 
Spatial Sound 3: Audio Rendering and Ambisonics
Spatial Sound 3: Audio Rendering and AmbisonicsSpatial Sound 3: Audio Rendering and Ambisonics
Spatial Sound 3: Audio Rendering and AmbisonicsRichard Elen
 
Surround Sound
Surround SoundSurround Sound
Surround SoundSid_007007
 

What's hot (18)

Real Time Drum Augmentation with Physical Modeling
Real Time Drum Augmentation with Physical ModelingReal Time Drum Augmentation with Physical Modeling
Real Time Drum Augmentation with Physical Modeling
 
Mono and stereo
Mono and stereoMono and stereo
Mono and stereo
 
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
[DL輪読会]IMPROVING VOICE SEPARATION BY INCORPORATING END-TO-END SPEECH RECOGNITION
 
PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm Portfolio
PHOENIX AUDIO TECHNOLOGIES  - A large Audio Signal Algorithm PortfolioPHOENIX AUDIO TECHNOLOGIES  - A large Audio Signal Algorithm Portfolio
PHOENIX AUDIO TECHNOLOGIES - A large Audio Signal Algorithm Portfolio
 
Teaching Computers to Listen to Music
Teaching Computers to Listen to MusicTeaching Computers to Listen to Music
Teaching Computers to Listen to Music
 
Surround sount system
Surround sount systemSurround sount system
Surround sount system
 
3 Digital Audio
3 Digital Audio3 Digital Audio
3 Digital Audio
 
Automatic Music Transcription
Automatic Music TranscriptionAutomatic Music Transcription
Automatic Music Transcription
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
Sound for Dummies and DJ's
Sound for Dummies and DJ'sSound for Dummies and DJ's
Sound for Dummies and DJ's
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3Py conjp2019 renyuanlyu_3
Py conjp2019 renyuanlyu_3
 
SoundField UPM-1 Review
SoundField UPM-1 ReviewSoundField UPM-1 Review
SoundField UPM-1 Review
 
Spatial Sound 3: Audio Rendering and Ambisonics
Spatial Sound 3: Audio Rendering and AmbisonicsSpatial Sound 3: Audio Rendering and Ambisonics
Spatial Sound 3: Audio Rendering and Ambisonics
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Soundpres
SoundpresSoundpres
Soundpres
 
Surround Sound
Surround SoundSurround Sound
Surround Sound
 
7
77
7
 

Similar to Ml conf2013 teaching_computers_share

IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET Journal
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 
AI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDAI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDJaideep Ghosh
 
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...aciijournal
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...sebastianewert
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Keunwoo Choi
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetcopelandadam
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetcopelandadam
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetcopelandadam
 
Aichroth audio forensics and automation
Aichroth audio forensics and automationAichroth audio forensics and automation
Aichroth audio forensics and automationFIAT/IFTA
 
Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningIRJET Journal
 
Sound recording glossary preivious
Sound recording glossary preiviousSound recording glossary preivious
Sound recording glossary preiviousPhillipWynne12281991
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET Journal
 

Similar to Ml conf2013 teaching_computers_share (20)

IRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural NetworkIRJET- Music Genre Recognition using Convolution Neural Network
IRJET- Music Genre Recognition using Convolution Neural Network
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
AI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDAI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUND
 
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
T26123129
T26123129T26123129
T26123129
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheet
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheet
 
Adam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheetAdam copeland ig2 task 1 work sheet
Adam copeland ig2 task 1 work sheet
 
Aichroth audio forensics and automation
Aichroth audio forensics and automationAichroth audio forensics and automation
Aichroth audio forensics and automation
 
Automatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep LearningAutomatic Music Generation Using Deep Learning
Automatic Music Generation Using Deep Learning
 
Sound recording glossary preivious
Sound recording glossary preiviousSound recording glossary preivious
Sound recording glossary preivious
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
The Allosphere
The AllosphereThe Allosphere
The Allosphere
 
Sound recording glossary
Sound recording glossarySound recording glossary
Sound recording glossary
 
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVMIRJET- Implementing Musical Instrument Recognition using CNN and SVM
IRJET- Implementing Musical Instrument Recognition using CNN and SVM
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Ml conf2013 teaching_computers_share

  • 1. TEACHING COMPUTERS TO LISTEN TO MUSIC Eric Battenberg ebattenberg@gracenote.com http://ericbattenberg.com Gracenote, Inc. Previously: UC Berkeley, Dept. of EECS Parallel Computing Laboratory CNMAT (Center for New Music and Audio Technologies)
  • 2. Business Verticals Rich, Diverse Multimedia Metadata Cloud Music Services, Mobile Apps and Devices Smart TVs, Second Screen Apps, Targeted Ads Connected Infotainment Systems on the Road 2
  • 3. SOME OF GRACENOTE’S PRODUCTS  Original product: CDDB (1998)  Recognize CDs and retrieve associated metadata  Scan & Match to precisely identify digital music quickly in large catalogs  ACR (Automatic Content Recognition)    Audio/Video Fingerprinting, Enabling second screen experiences Real-time audio classification systems for mobile devices. Contextually aware cross-modal personalization   Taste profiling for Music, TV, Movies, Social networks Music Mood descriptors that combine machine learning and editorial training data. 3
  • 5. 5
  • 6. MACHINE LISTENING  Speech processing    Speech processing makes up the vast majority of funded machine listening research. Just as there’s more to Computer Vision than OCR, there’s more to machine listening than speech recognition! Audio content analysis   Audio classification (music, speech, noise, laughter, cheering)   Audio fingerprinting (e.g. Gracenote, Shazam) Audio event detection (new song, channel change, hotword) Content-based Music Information Retrieval (MIR)  Today’s topic 6
  • 7. GETTING COMPUTERS TO “LISTEN” TO MUSIC  Not trying to get computers to “listen” for enjoyment.  More accurate: Analyzing music with computers.  What kind of information to get out of the analysis?  What instruments are playing?  What is the mood?  How fast/slow is it (tempo)?  What does the singer sound like?  How can I play this song on my instrument? Ben Harper James Brown 7
  • 8. CONTENT-BASED MUSIC INFORMATION RETRIEVAL  Many tasks:  Genre, mood classification, auto-tagging  Beat tracking, tempo detection  Music similarity, playlisting, recommendation  Heavily aided by collaborative filtering  Automatic music transcription  Source separation (voice, drums, bass…)  Music segmentation (verse, chorus) 8
  • 9. TALK SUMMARY  Introduction to Music Information Retrieval (MIR)  Some common techniques   Exciting new research directions   Auto-tagging, onset detection Deep learning! Example: Live Drum Understanding  Drum detection/transcription 9
  • 10. TALK SUMMARY  Introduction to Music Information Retrieval (MIR)  Some common techniques   Exciting new research directions   Auto-tagging, onset detection Deep learning! Example: Live Drum Understanding  Drum detection/transcription 10
  • 11. QUICK LESSON: THE SPECTROGRAM The spectrogram: Very common feature used in audio analysis.  Time-frequency representation of audio.  Take FFT of adjacent frames of audio samples, put them in a matrix.  Each column shows frequency content at a particular time. Frequency  11 Time
  • 12. MUSIC AUTO-TAGGING: TYPICAL APPROACHES  Typical approach:    Extract a bunch of hand-designed features describing small windows of the signal (e.g., spectral centroid, kurtosis, harmonicity, percussiveness, MFCCs, 100’s more…). Train a GMM or SVM to predict genre/mood/tags. Pros:    Works fairly well, was state of the art for a while. Well understood models, implementations widely available. Cons:  Bag-of-frames style approach lacks ability to describe rhythm and temporal dynamics.  Getting further improvements requires hand designing more features. 12
  • 13. LEARNING FEATURES: NEURAL NETWORKS Each layer computes a non-linear transformation of the previous layer.  Train to minimize output error.  Each hidden layer can be thought of as a set of features.  Train using backpropagation.  Iterative steps:  Compute output error  Backprop. error signal  Compute gradients   Compute activations 0.0 Backpropagate  1.0 Feedfordward  Sigmoid non-linearity Update all weights. Resurgence of neural networks:  More compute (GPUs, Google, etc.)  More data  A few new tricks… 13
  • 14. DEEP NEURAL NETWORKS Deep Neural Networks    Problem: Vanishing error signal.   Weights of lower layers do not change much. Solutions:    Train for a really long time.  Pre-train each hidden layer as an autoencoder. [Hinton, 2006] Rectified Linear Units [Krizhevsky, 2012] Sigmoid non-linearity Backpropagate  Millions to billions of parameters Many layers of “features” Achieving state of the art performance in vision and speech tasks. Feedfordward  Rectifier non-linearity 14
  • 15. AUTOENCODERS AND UNSUPERVISED FEATURE LEARNING   Many ways to learn features in an unsupervised way: Autoencoders – train a network to reconstruct the input  Reconstructed Hiddens Data (Features) Denoising Autoencoders [Vincent, 2008]  Data Restricted Boltzmann Machine (RBM) [Hinton, 2006]  Autoencoder: Train to reconstruct input Sparse Autoencoders  Clustering – K-Means, mixture models, etc.  Sparse Coding – learn overcomplete dictionary of features with sparsity constraint 15
  • 16. MUSIC AUTO-TAGGING: NEWER APPROACHES  Newer approaches to feature extraction:     Learn spectral features using Restricted Boltzmann Machines (RBMs) and Deep Neural Networks (DNN) [Hamel, 2010] – good genre performance. Learn sparse features using Predictive Sparse Decomposition (PSD) [Henaff, 2011] – good genre performance Learn beat-synchronous rhythm and timbre features with RBMs and DNNs [Schmidt, 2013] – improved mood performance Pros:    Some learned frequency features Hand-designing individual features is not required. Computers can learn complex high-order features that humans cannot hand code. [Henaff, 2011] Missing pieces:  More work on incorporating time: context, rhythm, and longterm structure into feature learning 16
  • 17. RECURRENT NEURAL NETWORKS  Non-linear sequence model  Hidden units have connections to previous time step  Unlike HMMs, can model long-term dependencies using distributed hidden state.  Recent developments (+ more compute) have made them much more feasible to train. Standard Recurrent Neural Network Output units Distributed hidden state Input units 17 from [Sutskever, 2013]
  • 18. APPLYING RNNS TO ONSET DETECTION  Onset Detection:  Audio Signal Detect the beginning of notes  Important for music transcription, beat tracking, etc.  Onset detection can be hard for certain instruments with ambiguous attacks or in the presence of background interference. Onset Envelope 18 from (Bello, 2005)
  • 19. ONSET DETECTION: STATE-OF-THE-ART  Using Recurrent Neural Networks (RNN) [Eyben, 2010], [Böck, 2012] Spectrogram + Δ’s @ 3 time-scales 3 layer RNN Onset Detection Function 3 Layer RNN 3 Layer RNN  RNN output trained to predict onset locations.  80-90% accurate (state-of-the art), compared to 60-80%  Can improve with more labeled training data, or possibly more unsupervised training. 19
  • 20. OTHER EXAMPLES OF RNNS IN MUSIC  Examples:   Classical music generation and transcription [BoulangerLewandowski, 2012]   Polyphonic piano note transcription [Böck, 2012] NMF-based source separation with predictive constraints from RNN. [ICASSP 2014 “Deep Learning in Music”]. RNNs are a promising way to model long-term contextual and temporal dependencies present in music. 20
  • 21. TALK SUMMARY  Introduction to Music Information Retrieval (MIR)  Some common techniques   Exciting new research directions   Auto-tagging, onset detection Deep learning! Example: Live Drum Understanding  Drum detection/transcription 21
  • 22. LIVE DRUM TRANSCRIPTION  Real-Time/Live operation  Useful with any percussion setup.    Before a performance, we can quickly train the system for a particular percussion setup. Or train a more general model for common drum types. Amplitude (dynamics) information.  Very important for musical understanding 22
  • 23. DRUM DETECTION SYSTEM training data (drum-wise audio) Drum Modeling Onset Detection Gamma Mixture Model cluster parameters (drum templates) performance (raw audio) Spectrogram Slice Extraction training data (drum-wisedata training audio) (drum-wise audio) performance (raw audio) performance (raw audio) Feature Extraction Onset Training Onset Detection Detection Performance Non-negative Vector Decomposition drum activations Source Separation Gamma Gamma Mixture Model Mixture Model cluster parameters (drum parameters cluster templates) (drum templates) Non-negative 23
  • 24. DRUM DETECTION SYSTEM training data (drum-wise audio) Drum Modeling Onset Detection Gamma Mixture Model cluster parameters (drum templates) performance (raw audio) Spectrogram Slice Extraction training data (drum-wisedata training audio) (drum-wise audio) performance (raw audio) performance (raw audio) Feature Extraction Onset Training Onset Detection Detection Performance Non-negative Vector Decomposition drum activations Source Separation Gamma Gamma Mixture Model Mixture Model cluster parameters (drum parameters cluster templates) (drum templates) Non-negative 24
  • 25. SPECTROGRAM SLICES  Extracted at onsets.  Each slice contains 100ms of audio 33ms 67ms 80 bands Detected onset 25 Head Slice
  • 26. DRUM DETECTION SYSTEM training data (drum-wise audio) Drum Modeling Onset Detection Gamma Mixture Model cluster parameters (drum templates) performance (raw audio) Spectrogram Slice Extraction training data (drum-wisedata training audio) (drum-wise audio) performance (raw audio) performance (raw audio) Feature Extraction Onset Training Onset Detection Detection Performance Non-negative Vector Decomposition drum activations Source Separation Gamma Gamma Mixture Model Mixture Model cluster parameters (drum parameters cluster templates) (drum templates) Non-negative 26
  • 27. MODELING DRUM SOUNDS WITH GAMMA MIXTURE MODELS  Instead of taking an “average” of all training slices for a single drum…  Model each drum using means from Gamma Mixture Model.  Captures range of sounds produced by each drum.  Cheaper to train than GMM (no covariance matrix)  More stable than GMM (no covariance matrix)  Cluster according to human auditory perception.   Not Euclidean distance Keep audio spectrogram in linear domain (not logdomain)  Important for linear source separation 27
  • 28. DRUM DETECTION SYSTEM training data (drum-wise audio) Drum Modeling Onset Detection Gamma Mixture Model cluster parameters (drum templates) performance (raw audio) Spectrogram Slice Extraction training data (drum-wisedata training audio) (drum-wise audio) performance (raw audio) performance (raw audio) Feature Extraction Onset Training Onset Detection Detection Performance Non-negative Vector Decomposition drum activations Source Separation Gamma Gamma Mixture Model Mixture Model cluster parameters (drum parameters cluster templates) (drum templates) Non-negative 28
  • 29. DECOMPOSING ONSETS ONTO TEMPLATES  Non-negative Vector Decomposition (NVD)  A simplification of Non-negative Matrix Factorization (NMF)  W matrix contains drum templates in its columns.  Adding a sparsity penalty (L1) on h improves NVD. Solve: Input x Drum Templates = W * Bass Snare Hi-Hat (closed) Hi-Hat (open) Ride 29 h Output
  • 30. EVALUATION  Test Data:   Recorded to stereo using multiple microphones.   8 different drums/cymbals 50-100 training hits per drum. Vary maximum number of templates per drum 30
  • 31. DETECTION RESULTS  Varying maximum templates per drum.  More templates = better accuracy Detection Accuracy (F-score) Amplitude Accuracy (Cosine Similarity) 31 More Templates Per Drum More Templates Per Drum
  • 32. DRUM DETECTION: MAIN POINTS  Gamma Mixture Model   Cheaper to train than GMM  More stable than GMM   For learning spectral drum templates. Keep audio spectrogram in linear domain (not log-domain) Non-negative Vector Decomposition (NVD)  For computing template activations from drum onsets.  Learning multiple templates per drum improves source separation and detection. 32
  • 33. AUDIO EXAMPLES  100 BPM, rock, syncopated snare drum, fills/flams.  Note the messy fills in the single template version. Original Performance Multiple Templates per Drum Single Template per Drum 33
  • 34. AUDIO EXAMPLES  181 BPM, fast rock, open hi-hat  Note the extra hi-hat notes in the single template version. Original Performance Multiple Templates per Drum Single Template per Drum 34
  • 35. AUDIO EXAMPLES  94 BPM, snare drum march, accented notes  Note the extra bass drum notes and many extra cymbals in the single template version. Original Performance Multiple Templates per Drum Single Template per Drum 35
  • 36. SUMMARY  Content-Based Music Information Retrieval    Mood, genre, onsets, beats, transcription, recommendation, and much, much more! Exciting directions include deep/feature learning and RNNs. Drum Detection  Gamma mixture model template learning.  Learning multiple templates = better source separation performance 36
  • 37. TEACHING COMPUTERS TO LISTEN TO MUSIC Eric Battenberg ebattenberg@gracenote.com http://ericbattenberg.com Gracenote, Inc. Previously: UC Berkeley, Dept. of EECS Parallel Computing Laboratory CNMAT (Center for New Music and Audio Technologies)
  • 38. GETTING INVOLVED IN MUSIC INFORMATION RETRIEVAL  Check out the proceedings of ISMIR (free online):   Participate in MIREX (annual MIR eval):   http://www.music-ir.org/mirex/wiki/MIREX_HOME Join the Music-IR mailing list:   http://www.ismir.net/ http://listes.ircam.fr/wws/info/music-ir Join the Music Information Retrieval Google Plus community (just started it):  https://plus.google.com/communities/109771668656894350107 38
  • 39. REFERENCES  Genre/Mood Classification:  [1] P. Hamel and D. Eck, “Learning features from music audio with deep belief networks,” Proc. of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), pp. 339–344, 2010.  [2] M. Henaff, K. Jarrett, K. Kavukcuoglu, and Y. LeCun, “Unsupervised learning of sparse features for scalable audio classification,” Proceedings of International Symposium on Music Information Retrieval (ISMIR’11), 2011.  [3] J. Anden and S. Mallat, “Deep Scattering Spectrum,” arXiv.org. 2013.  [4] E. Schmidt and Y. Kim, “Learning rhythm and melody features with deep belief networks,” ISMIR, 2013. 39
  • 40. REFERENCES  Onset Detection:  [1] J. Bello, L. Daudet, S. A. Abdallah, C. Duxbury, M. Davies, and M. Sandler, “A tutorial on onset detection in music signals,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, p. 1035, 2005.  [2] A. Klapuri, A. Eronen, and J. Astola, “Analysis of the meter of acoustic musical signals,” IEEE Transactions on Speech and Audio Processing, vol. 14, no. 1, p. 342, 2006.  [3] J. Bello, C. Duxbury, M. Davies, and M. Sandler, “On the use of phase and energy for musical onset detection in the complex domain,” IEEE Signal Processing Letters, vol. 11, no. 6, pp. 553–556, 2004.  [4] S. Böck, A. Arzt, F. Krebs, and M. Schedl, “Online realtime onset detection with recurrent neural networks,” presented at the International Conference on Digital Audio Effects (DAFx-12), 2012.  [5] F. Eyben, S. Böck, B. Schuller, and A. Graves, “Universal onset detection with bidirectional longshort term memory neural networks,” Proc. of ISMIR, 2010. 40
  • 41. REFERENCES  Neural Networks :   [1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in neural information processing systems, 2012. Unsupervised Feature Learning:    [2] G. E. Hinton and S. Osindero, “A fast learning algorithm for deep belief nets,” Neural Computation, 2006. [3] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” pp. 1096–1103, 2008. Recurrent Neural Networks:  [4] I. Sutskever, “Training Recurrent Neural Networks,” 2013.  [6] D. Eck and J. Schmidhuber, “Finding temporal structure in music: Blues improvisation with LSTM recurrent networks,” pp. 747–756, 2002.  [7] S. Bock and M. Schedl, “Polyphonic piano note transcription with recurrent neural networks,” pp. 121–124, 2012.  [8] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, “Modeling temporal dependencies in highdimensional sequences: Application to polyphonic music generation and transcription,” arXiv preprint arXiv:1206.6392, 2012.  [9] G. Taylor and G. E. Hinton, “Two Distributed-State Models For Generating High-Dimensional Time Series,” Journal of Machine Learning Research, 2011. 41
  • 42. REFERENCES  Drum Understanding:  [1] E. Battenberg, “Techniques for Machine Understanding of Live Drum Performances,” PhD Thesis, University of California, Berkeley, 2012.  [2] E. Battenberg and D. Wessel, “Analyzing drum patterns using conditional deep belief networks,” presented at the International Society for Music Information Retrieval Conference, Porto, Portugal, 2012.  [3] E. Battenberg, V. Huang, and D. Wessel, “Toward live drum separation using probabilistic spectral clustering based on the Itakura-Saito divergence,” presented at the Audio Engineering Society Conference: 45th International Conference: Applications of Time-Frequency Processing in Audio, 2012, vol. 3. 42

Editor's Notes

  1. And ultimately, transform how people experience their favorite movies, TV shows and music
  2. GMM = Gaussian Mixture ModelSVM = Support Vector Machine
  3. Snare, bass, hi-hat is not all there is to drumming
  4. Feature extraction -> machine learning
  5. Feature extraction -> machine learning
  6. 80 bark bands down from 513 positive FFT coefficientsWork in 2008 shows that this type of spectral dimensionality reduction speeds up NMF while retaining/improving separation performance
  7. Feature extraction -> machine learning
  8. Feature extraction -> machine learning
  9. \vec{h}_i are template activations
  10. Superior drummer 2.0 is highly multi-sampled so it is a good approximation of a real-world signal (thought it’s probably a little too professional sounding for most people)
  11. Actual optimal number of templates:Drum:Bass,snare,cHihat,oHihat,RideHead:{3,4,3,2,2}Tail:{2,4,3,2,2}
  12. Actual optimal number of templates:Drum:Bass,snare,cHihat,oHihat,RideHead:{3,4,3,2,2}Tail:{2,4,3,2,2}
  13. Actual optimal number of templates:Drum:Bass,snare,cHihat,oHihat,RideHead:{3,4,3,2,2}Tail:{2,4,3,2,2}