SlideShare a Scribd company logo
1 of 47
Generating music in the raw audio domain
Sander Dieleman - September 10th, 2018
PhD student, Ghent University, Belgium (2010 - 2016)
“Learning feature hierarchies for musical audio signals”
Machine learning intern, Spotify, NYC (summer 2014)
Scaling up content-based music recommendation
Research Scientist, DeepMind, London UK (since July 2015)
AlphaGo, WaveNet, ...
http://benanne.github.io
sanderdieleman@gmail.com
@sedielem
Overview
Why model music in the raw audio domain?
WaveNet: an autoregressive model of raw audio
Generating music with WaveNets
Modelling long-range correlations with WaveNets
Generating music with long-range consistency
3
Why model music in the raw audio domain?
Why raw audio?
Music generation is typically studied in the symbolic domain
5
6
7
Why raw audio?
8
Many instruments have complex action spaces
⇒ Rich palette of sounds and timbral variations
Guitar
● pick vs. finger
● picking position
● frets
● harmonics
● ...
WaveNet:
an autoregressive model of raw audio
“WaveNet: A Generative Model for Raw Audio”, van den Oord et al. (2016)
10
Modelling raw audio: μ-law encoding / quantisation
11
uniform
quantisation
μ-law
quantisation
WaveNet models audio as a discrete time series
WaveNet: an autoregressive model of raw audio
12
input
hidden
layer
hidden
layer
hidden
layer
output
“WaveNet: A Generative Model for Raw Audio”, van den Oord et al. (2016)
13
WaveNet: an autoregressive model of raw audio
input
hidden
layer
hidden
layer
hidden
layer
output
14
Dilated convolutions to enlarge receptive fields
input
hidden
layer
hidden
layer
hidden
layer
output
dilation 20 = 1
dilation 21 = 2
dilation 22 = 4
dilation 23 = 8
15
Dilated convolutions to enlarge receptive fields
input
hidden
layer
hidden
layer
hidden
layer
output
dilation 20 = 1
dilation 21 = 2
dilation 22 = 4
dilation 23 = 8
Generating music with WaveNets
17
18
19
20
Aside: NSynth
NSynth: WaveNet autoencoders for timbre modelling
22
Collaboration with
“Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders”, Engel et al. (ICML 2017)
NSynth: WaveNet autoencoders for timbre modelling
23
Blog post: https://magenta.tensorflow.org/nsynth
Modelling long-range
correlations with WaveNets
“The challenge of realistic music generation: modelling raw audio at scale”, Dieleman et al. (2018)
https://arxiv.org/
abs/1806.10474
26
Dilation: #layers ~ log(receptive field length)
input
hidden
layer
hidden
layer
hidden
layer
output
dilation 20 = 1
dilation 21 = 2
dilation 22 = 4
dilation 23 = 8
27
… but memory usage ~ receptive field length
Required model depth is logarithmic in the desired receptive field length
Required memory usage during training is still linear in the desired
receptive field length!
⇒ We cannot scale indefinitely using dilation
Autoregressive discrete autoencoders (ADAs)
28x input
q = f(x) query
q’ = VQ(q)
quantised query
encoder
local model
quantisation
modulator
y code
… 5 207 131 18 168 …
decoder
Autoregressive discrete autoencoders (ADAs)
29
“Neural Discrete Representation Learning”, van den Oord et al. (2017)
“The challenge of realistic music generation: modelling raw audio at scale”, Dieleman et al. (2018)
Two variants:
● VQ-VAE: vector quantisation variational autoencoder
learn a codebook, quantise the queries to elements of this codebook
● AMAE: argmax autoencoder
encourage the queries to be close to one-hot (e.g. use softmax), quantise
them on the simplex
VQ-VAE
30
encoder
decoder
vector quantisation
x input
x’ = g(q’) reconstruction
q = f(x) query
q’ = VQ(q) quantised query
y integer sequence
Vector quantisation
31
D-dimensional Euclidean space
Vector quantisation
32
D-dimensional Euclidean space
K codes
0
1
2
4
5
6
7
3
Vector quantisation
33
D-dimensional Euclidean space
encoder
q
Vector quantisation
34
D-dimensional Euclidean space
encoder
q
q’
Vector quantisation
35
encoder
q
q’
decoder
6
Straight-through estimation for VQ
36
encoder
q
q’
decoder
backward pass
Training VQ autoencoders
37
ℒ = -log p(x|q’) Reconstruction loss (NLL)
Training VQ autoencoders
38
ℒ = -log p(x|q’)
+ α · (q’ - [q])2
Reconstruction loss (NLL)
Codebook loss
[...] = tf.stop_gradient(...)
Training VQ autoencoders
39
ℒ = -log p(x|q’)
+ α · (q’ - [q])2
+ β · ([q’] - q)2
Reconstruction loss (NLL)
Codebook loss
Commitment loss
[...] = tf.stop_gradient(...)
Stacking ADAs
46
… 17 95 88 145 ...
… 210 37 121 8 ...
level 1
ADA
level 2
ADA
level 3
unconditional model
Stacking ADAs
47
… 17 95 88 145 ...
… 210 37 121 8 ...
level 1
ADA
level 2
ADA
level 3
unconditional model
16 kHz
2 kHz
250 Hz
Code sequences are harder to model than audio
48
Training second level ADAs is much harder
● Train VQ-VAE with population-based training (PBT)
● Train AMAE, which is less unstable
Predictability(NLL)
Receptive field length
Audio Code sequence
“Population Based Training
of Neural Networks”,
Jaderberg et al. (2017)
Generating music
with long-range consistency
Some results from a human evaluation
50
Samples
https://bit.ly/2IPXoDu
51
52
Keynote speakers
Kenneth Stanley, University of Central Florida
David Ha, Google Brain
Allison Parrish, NYU ITP
Yaroslav Ganin, DeepMind
Yaniv Taigman, Facebook AI Research
Submission deadline (papers & art): October 28th
https://nips2018creativity.github.io/
https://deepmind.com/blog/wavenet-generative-model-raw-audio/
WaveNet blog post:
53
@sedielem

More Related Content

Similar to Sander Dieleman - Generating music in the raw audio domain - Creative AI meetup

PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Miniproject audioenhancement-100223094301-phpapp02
Miniproject audioenhancement-100223094301-phpapp02Miniproject audioenhancement-100223094301-phpapp02
Miniproject audioenhancement-100223094301-phpapp02mohankota
 
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesEfficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesVissarion Fisikopoulos
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 
DSP_Lab_MAnual_-_Final_Edition.pdf
DSP_Lab_MAnual_-_Final_Edition.pdfDSP_Lab_MAnual_-_Final_Edition.pdf
DSP_Lab_MAnual_-_Final_Edition.pdfParthDoshi66
 
DSP_Lab_MAnual_-_Final_Edition[1].docx
DSP_Lab_MAnual_-_Final_Edition[1].docxDSP_Lab_MAnual_-_Final_Edition[1].docx
DSP_Lab_MAnual_-_Final_Edition[1].docxParthDoshi66
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐCHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐlykhnh386525
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionNAVER Engineering
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Oswald Campesato
 
Lecture 2- Practical AD and DA Conveters (Online Learning).pptx
Lecture 2- Practical AD and DA Conveters (Online Learning).pptxLecture 2- Practical AD and DA Conveters (Online Learning).pptx
Lecture 2- Practical AD and DA Conveters (Online Learning).pptxHamzaJaved306957
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1qpqpqp
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~Yamagishi Laboratory, National Institute of Informatics, Japan
 
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...Frank Oellien
 

Similar to Sander Dieleman - Generating music in the raw audio domain - Creative AI meetup (20)

WaveNet
WaveNetWaveNet
WaveNet
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Miniproject audioenhancement-100223094301-phpapp02
Miniproject audioenhancement-100223094301-phpapp02Miniproject audioenhancement-100223094301-phpapp02
Miniproject audioenhancement-100223094301-phpapp02
 
Mini Project- Audio Enhancement
Mini Project- Audio EnhancementMini Project- Audio Enhancement
Mini Project- Audio Enhancement
 
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesEfficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
 
Chennai python augustmeetup
Chennai python augustmeetupChennai python augustmeetup
Chennai python augustmeetup
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
DSP_Lab_MAnual_-_Final_Edition.pdf
DSP_Lab_MAnual_-_Final_Edition.pdfDSP_Lab_MAnual_-_Final_Edition.pdf
DSP_Lab_MAnual_-_Final_Edition.pdf
 
Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)
Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)
Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)
 
DSP_Lab_MAnual_-_Final_Edition[1].docx
DSP_Lab_MAnual_-_Final_Edition[1].docxDSP_Lab_MAnual_-_Final_Edition[1].docx
DSP_Lab_MAnual_-_Final_Edition[1].docx
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Beamforming_202011.pptx
Beamforming_202011.pptxBeamforming_202011.pptx
Beamforming_202011.pptx
 
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐCHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
 
Lecture 2- Practical AD and DA Conveters (Online Learning).pptx
Lecture 2- Practical AD and DA Conveters (Online Learning).pptxLecture 2- Practical AD and DA Conveters (Online Learning).pptx
Lecture 2- Practical AD and DA Conveters (Online Learning).pptx
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniques
 
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
エンドツーエンド音声合成に向けたNIIにおけるソフトウェア群 ~ TacotronとWaveNetのチュートリアル (Part 1)~
 
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
Pre-Integrated Volume-Rendering with Randomized Transfer-Functions (V3D2 Work...
 

More from Luba Elliott

Luba Elliott - AI art - ICCV Conference
Luba Elliott - AI art - ICCV ConferenceLuba Elliott - AI art - ICCV Conference
Luba Elliott - AI art - ICCV ConferenceLuba Elliott
 
Luba Elliott - AI in contemporary art practice - Oxford
Luba Elliott - AI in contemporary art practice - OxfordLuba Elliott - AI in contemporary art practice - Oxford
Luba Elliott - AI in contemporary art practice - OxfordLuba Elliott
 
Luba Elliott - AI in recent art practice - ML Prague
Luba Elliott - AI in recent art practice - ML PragueLuba Elliott - AI in recent art practice - ML Prague
Luba Elliott - AI in recent art practice - ML PragueLuba Elliott
 
Three Images of the New - Richard Hames - Creative AI meetup
Three Images of the New - Richard Hames - Creative AI meetupThree Images of the New - Richard Hames - Creative AI meetup
Three Images of the New - Richard Hames - Creative AI meetupLuba Elliott
 
AI Art Gallery Overview - Luba Elliott - NeurIPS Creativity Workshop
AI Art Gallery Overview - Luba Elliott - NeurIPS Creativity WorkshopAI Art Gallery Overview - Luba Elliott - NeurIPS Creativity Workshop
AI Art Gallery Overview - Luba Elliott - NeurIPS Creativity WorkshopLuba Elliott
 
Creativity is Intelligence - Kenneth Stanley - NeurIPS Creativity Workshop
Creativity is Intelligence - Kenneth Stanley - NeurIPS Creativity WorkshopCreativity is Intelligence - Kenneth Stanley - NeurIPS Creativity Workshop
Creativity is Intelligence - Kenneth Stanley - NeurIPS Creativity WorkshopLuba Elliott
 
Seen by machine: Computational Spectatorship in the BBC Archive
Seen by machine: Computational Spectatorship in the BBC ArchiveSeen by machine: Computational Spectatorship in the BBC Archive
Seen by machine: Computational Spectatorship in the BBC ArchiveLuba Elliott
 
Luba Elliott AI art overview
Luba Elliott AI art overview Luba Elliott AI art overview
Luba Elliott AI art overview Luba Elliott
 
Natasha Jaques - Learning via Social Awareness - Creative AI meetup
Natasha Jaques - Learning via Social Awareness - Creative AI meetupNatasha Jaques - Learning via Social Awareness - Creative AI meetup
Natasha Jaques - Learning via Social Awareness - Creative AI meetupLuba Elliott
 
Marco Marchesi - Practical uses of style transfer in the creative industry
Marco Marchesi - Practical uses of style transfer in the creative industryMarco Marchesi - Practical uses of style transfer in the creative industry
Marco Marchesi - Practical uses of style transfer in the creative industryLuba Elliott
 
Hooman Shayani - CAD/CAM in the Age of AI: Designers’ Journey from Earth to Sky
Hooman Shayani - CAD/CAM in the Age of AI: Designers’ Journey from Earth to SkyHooman Shayani - CAD/CAM in the Age of AI: Designers’ Journey from Earth to Sky
Hooman Shayani - CAD/CAM in the Age of AI: Designers’ Journey from Earth to SkyLuba Elliott
 
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLuba Elliott
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott
 
Luba Elliott - Seeing AI through Art
Luba Elliott - Seeing AI through ArtLuba Elliott - Seeing AI through Art
Luba Elliott - Seeing AI through ArtLuba Elliott
 
Georgia Ward Dyer - O Time thy pyramids - Creative AI meetup
Georgia Ward Dyer - O Time thy pyramids - Creative AI meetupGeorgia Ward Dyer - O Time thy pyramids - Creative AI meetup
Georgia Ward Dyer - O Time thy pyramids - Creative AI meetupLuba Elliott
 
Daniel Berio - Graffiti synthesis, a motion centric approach - Creative AI me...
Daniel Berio - Graffiti synthesis, a motion centric approach - Creative AI me...Daniel Berio - Graffiti synthesis, a motion centric approach - Creative AI me...
Daniel Berio - Graffiti synthesis, a motion centric approach - Creative AI me...Luba Elliott
 
Ali Eslami - Artificial Intelligence and Computer Aided Design - Creative AI ...
Ali Eslami - Artificial Intelligence and Computer Aided Design - Creative AI ...Ali Eslami - Artificial Intelligence and Computer Aided Design - Creative AI ...
Ali Eslami - Artificial Intelligence and Computer Aided Design - Creative AI ...Luba Elliott
 
Daghan Cam - Adaptive Autonomous Manufacturing with AI - Creative AI meetup
Daghan Cam - Adaptive Autonomous Manufacturing with AI - Creative AI meetupDaghan Cam - Adaptive Autonomous Manufacturing with AI - Creative AI meetup
Daghan Cam - Adaptive Autonomous Manufacturing with AI - Creative AI meetupLuba Elliott
 
Martin Arjovsky - Wasserstein GAN - Creative AI meetup
Martin Arjovsky - Wasserstein GAN - Creative AI meetupMartin Arjovsky - Wasserstein GAN - Creative AI meetup
Martin Arjovsky - Wasserstein GAN - Creative AI meetupLuba Elliott
 

More from Luba Elliott (20)

Luba Elliott - AI art - ICCV Conference
Luba Elliott - AI art - ICCV ConferenceLuba Elliott - AI art - ICCV Conference
Luba Elliott - AI art - ICCV Conference
 
Luba Elliott - AI in contemporary art practice - Oxford
Luba Elliott - AI in contemporary art practice - OxfordLuba Elliott - AI in contemporary art practice - Oxford
Luba Elliott - AI in contemporary art practice - Oxford
 
Luba Elliott - AI in recent art practice - ML Prague
Luba Elliott - AI in recent art practice - ML PragueLuba Elliott - AI in recent art practice - ML Prague
Luba Elliott - AI in recent art practice - ML Prague
 
Three Images of the New - Richard Hames - Creative AI meetup
Three Images of the New - Richard Hames - Creative AI meetupThree Images of the New - Richard Hames - Creative AI meetup
Three Images of the New - Richard Hames - Creative AI meetup
 
AI Art Gallery Overview - Luba Elliott - NeurIPS Creativity Workshop
AI Art Gallery Overview - Luba Elliott - NeurIPS Creativity WorkshopAI Art Gallery Overview - Luba Elliott - NeurIPS Creativity Workshop
AI Art Gallery Overview - Luba Elliott - NeurIPS Creativity Workshop
 
Creativity is Intelligence - Kenneth Stanley - NeurIPS Creativity Workshop
Creativity is Intelligence - Kenneth Stanley - NeurIPS Creativity WorkshopCreativity is Intelligence - Kenneth Stanley - NeurIPS Creativity Workshop
Creativity is Intelligence - Kenneth Stanley - NeurIPS Creativity Workshop
 
Seen by machine: Computational Spectatorship in the BBC Archive
Seen by machine: Computational Spectatorship in the BBC ArchiveSeen by machine: Computational Spectatorship in the BBC Archive
Seen by machine: Computational Spectatorship in the BBC Archive
 
Luba Elliott AI art overview
Luba Elliott AI art overview Luba Elliott AI art overview
Luba Elliott AI art overview
 
Natasha Jaques - Learning via Social Awareness - Creative AI meetup
Natasha Jaques - Learning via Social Awareness - Creative AI meetupNatasha Jaques - Learning via Social Awareness - Creative AI meetup
Natasha Jaques - Learning via Social Awareness - Creative AI meetup
 
Design in AI
Design in AIDesign in AI
Design in AI
 
Marco Marchesi - Practical uses of style transfer in the creative industry
Marco Marchesi - Practical uses of style transfer in the creative industryMarco Marchesi - Practical uses of style transfer in the creative industry
Marco Marchesi - Practical uses of style transfer in the creative industry
 
Hooman Shayani - CAD/CAM in the Age of AI: Designers’ Journey from Earth to Sky
Hooman Shayani - CAD/CAM in the Age of AI: Designers’ Journey from Earth to SkyHooman Shayani - CAD/CAM in the Age of AI: Designers’ Journey from Earth to Sky
Hooman Shayani - CAD/CAM in the Age of AI: Designers’ Journey from Earth to Sky
 
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
Luba Elliott - Seeing AI through Art
Luba Elliott - Seeing AI through ArtLuba Elliott - Seeing AI through Art
Luba Elliott - Seeing AI through Art
 
Georgia Ward Dyer - O Time thy pyramids - Creative AI meetup
Georgia Ward Dyer - O Time thy pyramids - Creative AI meetupGeorgia Ward Dyer - O Time thy pyramids - Creative AI meetup
Georgia Ward Dyer - O Time thy pyramids - Creative AI meetup
 
Daniel Berio - Graffiti synthesis, a motion centric approach - Creative AI me...
Daniel Berio - Graffiti synthesis, a motion centric approach - Creative AI me...Daniel Berio - Graffiti synthesis, a motion centric approach - Creative AI me...
Daniel Berio - Graffiti synthesis, a motion centric approach - Creative AI me...
 
Ali Eslami - Artificial Intelligence and Computer Aided Design - Creative AI ...
Ali Eslami - Artificial Intelligence and Computer Aided Design - Creative AI ...Ali Eslami - Artificial Intelligence and Computer Aided Design - Creative AI ...
Ali Eslami - Artificial Intelligence and Computer Aided Design - Creative AI ...
 
Daghan Cam - Adaptive Autonomous Manufacturing with AI - Creative AI meetup
Daghan Cam - Adaptive Autonomous Manufacturing with AI - Creative AI meetupDaghan Cam - Adaptive Autonomous Manufacturing with AI - Creative AI meetup
Daghan Cam - Adaptive Autonomous Manufacturing with AI - Creative AI meetup
 
Martin Arjovsky - Wasserstein GAN - Creative AI meetup
Martin Arjovsky - Wasserstein GAN - Creative AI meetupMartin Arjovsky - Wasserstein GAN - Creative AI meetup
Martin Arjovsky - Wasserstein GAN - Creative AI meetup
 

Recently uploaded

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesSanjay Willie
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Sander Dieleman - Generating music in the raw audio domain - Creative AI meetup

Editor's Notes

  1. Mention MIDI etc.
  2. What is wavenet? blog post / paper WaveNet: generative model of raw audio waveforms: learn probability distribution of sound This is challenging because sound is a very long time series: 16kHz -> 16000 samples / second Very successful for text-to-speech: state of the art in terms of audio quality (used in the Google Assistant)
  3. WaveNet is inspired by similarly structured models that can generate images, called PixelCNNs We drew some lessons from this work when developing WaveNet: treat the data to be modelled as discrete. (even though amplitude of an audio waveform, just like the intensity of a pixel in an image, is actually continuous in the real world.) Output of the model: distribution over a number of discrete choices (softmax). Mu-law matches our logarithmic perception of loudness (not a new idea: also used in e.g. phones).
  4. WaveNet is a convolutional network, consisting of many layers of nonlinear processing, and the functions that these layers compute are learnt from data. Causal convolutions: at each timestep, connections only to previous timesteps (so the model can’t see the future) Black nodes represent input signal values Red nodes represent internal values computed by the network Grey nodes represent the output: the predicted distribution of the next signal value
  5. We need large receptive fields so the model remembers what it generated before.
  6. Much cheaper to get large receptive fields
  7. Samples: MTT Samples from a model trained on a varied collection of music (contemporary, classical, non-western music) The model drifts between different styles: the receptive field is not large enough (60 layers, receptive field < 1 second). Music is harder to model than speech signals: much less constrained (in terms of timbre etc.), extremely long-range temporal correlations
  8. Samples: piano We reduced the complexity of the problem a bit by constructing single-instrument datasets from YouTube
  9. Samples: guitar
  10. Samples: string quartet
  11. An ADA consists of a WaveNet (local model) with an encoder and a modulator attached to it. The encoder takes the waveform and tries to compress it into a new representation, that makes abstraction of local signal structure. The modulator tries to influence the local model such that it reproduces the encoder input. The local model reconstructs any local signal structure that the encoder removed when compressing the signal. The quantisation module ensures that the compressed representation is discrete, limiting its information capacity and making it easier to model with another WaveNet.
  12. Although it’s possible to interpret this as a VAE, there isn’t really any tangible benefit to doing this.
  13. Remember: D dimensional space, K codes in the codebook. There are K codes, each code is represented by a D-dimensional vector!
  14. Quantise to closest point in code space
  15. output quantised vector for the decoder, and also an integer representation of the selected code
  16. In backprop, the backward pass takes a different path than the forward pass! We basically pretend the quantisation didn’t happen. As long as q and q’ are close enough (the quantisation error is low), this is fine.
  17. Three terms in the loss function Reconstruction loss: here we do straight-through for q’->q
  18. Codebook loss: only affects codebook (gradient w.r.t. query is stopped)
  19. Commitment loss: only affects encoder (gradient w.r.t. code is stopped)
  20. NLL and commitment are the same as in VQ-VAE. We have an additional “diversity loss” which ensures that all possible codes are used: their expectation over time/batch should be one over the number of codes C. Note that commitment is an L2 loss, putting this on a softmax output doesn’t really work. This is another reason for ReLU + divisive normalisation.
  21. 28 respondents who rated clips 1-5 (calibrated based on real excerpts from the dataset), in terms of audio fidelity and musicality. For musicality, we clearly see that stacking more levels makes the score improve.
  22. There is an upcoming blog post about the stacked model as well. Hopefully within the next month or so.