SlideShare a Scribd company logo
1 of 61
Download to read offline
Speech
recognition:
Art of the possible
Dominik.Lukes@ctl.ox.ac.uk @techczech
Dominik’s journey
Computational linguistics
Cognitive linguistics
Language teaching
1990–1995
Language teacher training
Translation
Metaphor / discourse studies
1995–2008
Readability
Learning / Assistive technology
Dyslexia teacher training
2009 – present
Bill Gates in 2011
“The next big thing is definitely
speech and voice recognition.”
What do we want to know?
What is the current state of the
art?
How we got here?
Where are going?
Are we asking the right
questions?
Tasks for speech recognition by difficulty
Select
word
from list
Interpret
command
Type
dictation
Transcribe
presentation
Transcribe
conversation
How we think of it vs how it is
Select word from list
Interpret
command Type
dictation
Transcribe
presentation
Transcribe
conversation
Transcribe
conversation
Transcribe
presentation
Type dictation
Interpret
command
Select
word from
list
Speech recognition approximate timeline
Select digit
1950s
Select from 1000
words
1970s
Select from large
vocabulary
1980s
Dictate word by
word
1990s
Dictate whole
sentences
1997
Transcribe
YouTube video
2012
Transcribe
conversation
2019
What is the actual job of
speech recognition?
What is this word?
[pʰɹɛtsɫ̩]
[pɹɛtsl]
/pretsəl/
<pretzel>
What’s the problem
aspirated /p/ at
start of a stressed syllable
devoiced /r/ following /p/
labialised /r/
following /p/ dark /l/
syllabic
consonant
glottal
stop
It gets worse: find the missing sounds
Course on speech recognition 1993
Faster computers won’t help
improve speech recognition. We
need a new approach.
Dragon Naturally Speaking
released in 1997. Can
recognise whole
sentences.
What happened?
How speech recognition does not work?
Finding individual sounds
(phonemes) in the speech and
matching them to letters.
How speech recognition actually works?
P(W|C)
What is the likelihood that the
next word is X given what came
before?
Actually, it is quite a bit more complicated (Huang and Deng 2009)
Probabilistic (stochastic)
ASR enabled the change.
Linguistics took the back
seat.
Fred Jelinek (ASR Pioneer - 1988?)
"Every time I fire a linguist, the
performance of the speech
recognizer goes up"
Consequence of
probabilistic approach:
Worse on words not
predictable from
context
Names Acronyms
Specialist
Terms
Question in 2011
I recorded a lecture, can I use
Dragon to transcribe it?
“Caption fails” in 2014 provided source for comedy
YouTube Captions today are usable and useful
So what happened
between 2014 and 2022?
Ingredients of success
Larger data sets
More computing power
Neural networks
Patrick Winston (2015) MIT Lecture 12a in AI course
It was in 2010, yes, that's right. It was in 2010. We
were having our annual discussion about what we
would dump from 6034 in order to make room for
some other stuff. And we almost killed off neural
nets. That might seem strange because our heads
are stuffed with neurons. … But many of us felt that
the neural models of the day weren't much in
the way of faithful models of what actually goes
on inside our heads. And besides that, nobody
had ever made a neural net that was worth a
darn for doing anything.
2012 – ImageNet showed
that Neural Networks are
much better at computing
the probabilities for
complex data.
Ok, we have neural nets,
what does that mean?
Things to know about Neural Nets
Everything has a probability
Same input does not produce
same output
They have no ‘sanity check’
or ‘common sense’
What do probabilities look like?
What BERT is not: Lessons from a new suite of
psycholinguistic diagnostics for language
models
Allyson Ettinger 2019
https://what-if.xkcd.com/34
Output changes as more
information is made
available. (Not always for
the better)
Examples from today’s captions
Crystal > Chris is
Am > and
experts > experience
AR > a our
Different ways of transcribing Dua Lipa
alipa
dualipa
dua lipa
lipa
duda lipa
Rise and mostly fall of Google’s new spell Czech
Tracking faces at the tips of the shoes
Hallucination is a big problem
Question asked by faculty member in 2021
We correct the transcripts, why
doesn’t the system learn the
correct spelling?
Adding your own word list
just tweaks the
probabilities.
Setting a genre setting
tweaks the probabilities.
Another thing to know about NN
Neural Nets use very large data
sets and can take days or
weeks to train.
Consequences of NN size
Speech recognition is often not
done on device.
Individual input often cannot adjust
the quality (except in pre-training)
Most applications use APIs from the
big players
Few open source/free options
Big players in the field
Google
Microsoft (now also Nuance)
Amazon
Interesting smaller companies
Verbit.ai
Carescribe.io (Caption.Ed)
Otter.ai
Rev.ai
Interesting applications
Descript
Microsoft Reading Progress
Microsoft Presentation Coach
What can we expect
in the future
Cautionary tale by SMBC
The Original Roomba (2002) vs Roomba S9+ (2019) - Wow!
What happens in speeches
Fillers Repetition
What does conversation actually look like?
Possible futures?
Incremental
improvement
similar to Roomba in 17 years
Accurate
lecture
transcripts
Fluent
dictation with
pauses
Better meeting
transcription
Revolutionary
change
similar to change in speech
recognition in 6 years
Informal
conversation
transcription
Interactive
dictation
Multilingual
speech
transcription
How should we think about accuracy?
We speak 120-180 words per minute
99% accurate = 2 errors per minute
From Sept 2014 xkcd.com/1425
Sometimes it is hard to judge
how much effort will be needed
to solve a seemingly easy
problem.
Wishlist (a few hours of coding)
Transcripts indicate level
of confidence
Benchmarks for lecture
transcripts
Better manual control of
transcripts (like Descript)
Dreamlist (5 years and a research team)
Multilingual transcription
(identify change in
language)
Multimodal transcription
(use information from
video)
Raw to readable
transcript
Welcome to the
panel
Kate Knill
Machine Intelligence
Lab, University of
Cambridge
Richard Cave
MND Association (and
formerly Google
project Euphonia)
Richard
Purcell
Caption.Ed
Irit Opher
Head of Research at
Verbit.ai
What is the current state of
the art of speech recognition
in general and in the
transcription of recorded
speech in particular?
What are the current quality
metrics and how much do
they tell us about suitability
of models? Do we need
better ones?
After the big recent jump in
performance, are we seeing
a plateau with incremental
growth or can we expect
another step change in
quality?
Where can we see the most
innovation? What are the
research and development
blind spots where more effort
is needed?
What are the currently
unsolved problems for
which we do not have a
solution?
What is the space for
smaller players to innovate
in this space? How much do
they have to rely on pre-
trained models from big
providers? Is there space for
open source?
This presentation is licensed
under Creative Commons By
Attribution license except where
otherwise noted.
Icons and stock images from Microsoft
Office 365 creative premium. They
cannot be distributed separately from this
document.

More Related Content

Similar to Speech recognition - Art of the possible

Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLLawrie Hunter
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
Format Matters - How presentation affects understanding
Format Matters - How presentation affects understandingFormat Matters - How presentation affects understanding
Format Matters - How presentation affects understandingMike Rice
 
The Cocktail Party Effect. An inclusive vision of conversational interactions.
The Cocktail Party Effect. An inclusive vision of conversational interactions.The Cocktail Party Effect. An inclusive vision of conversational interactions.
The Cocktail Party Effect. An inclusive vision of conversational interactions.Isabella Loddo
 
Designing applications for voice interface platforms
Designing applications for voice interface platformsDesigning applications for voice interface platforms
Designing applications for voice interface platformsmanphilip
 
Narrate Your Way To Success
Narrate Your Way To SuccessNarrate Your Way To Success
Narrate Your Way To SuccessTCUK
 
Do We Need Better Presentations
Do We Need Better PresentationsDo We Need Better Presentations
Do We Need Better PresentationsJose Ramon Macias
 
How to tell a better story (in code)(final)
How to tell a better story (in code)(final)How to tell a better story (in code)(final)
How to tell a better story (in code)(final)Bonnie Pan
 
Sketchstorming Workshop - UX Copenhagen 2018
Sketchstorming Workshop  - UX Copenhagen 2018 Sketchstorming Workshop  - UX Copenhagen 2018
Sketchstorming Workshop - UX Copenhagen 2018 Teo Choong Ching
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
The State of Automatic Speech Recognition 2022 (2).pdf
The State of Automatic Speech Recognition 2022 (2).pdfThe State of Automatic Speech Recognition 2022 (2).pdf
The State of Automatic Speech Recognition 2022 (2).pdf3Play Media
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...TAUS - The Language Data Network
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendEgor Pushkin
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural FrontierJohn Tinsley
 

Similar to Speech recognition - Art of the possible (20)

#5 Predicting Machine Translation Quality
#5 Predicting Machine Translation Quality#5 Predicting Machine Translation Quality
#5 Predicting Machine Translation Quality
 
Gadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALLGadgets pwn us? A pattern language for CALL
Gadgets pwn us? A pattern language for CALL
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Format Matters - How presentation affects understanding
Format Matters - How presentation affects understandingFormat Matters - How presentation affects understanding
Format Matters - How presentation affects understanding
 
The Cocktail Party Effect. An inclusive vision of conversational interactions.
The Cocktail Party Effect. An inclusive vision of conversational interactions.The Cocktail Party Effect. An inclusive vision of conversational interactions.
The Cocktail Party Effect. An inclusive vision of conversational interactions.
 
Designing applications for voice interface platforms
Designing applications for voice interface platformsDesigning applications for voice interface platforms
Designing applications for voice interface platforms
 
Narrate Your Way To Success
Narrate Your Way To SuccessNarrate Your Way To Success
Narrate Your Way To Success
 
Do We Need Better Presentations
Do We Need Better PresentationsDo We Need Better Presentations
Do We Need Better Presentations
 
How to tell a better story (in code)(final)
How to tell a better story (in code)(final)How to tell a better story (in code)(final)
How to tell a better story (in code)(final)
 
Sketchstorming Workshop - UX Copenhagen 2018
Sketchstorming Workshop  - UX Copenhagen 2018 Sketchstorming Workshop  - UX Copenhagen 2018
Sketchstorming Workshop - UX Copenhagen 2018
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
The State of Automatic Speech Recognition 2022 (2).pdf
The State of Automatic Speech Recognition 2022 (2).pdfThe State of Automatic Speech Recognition 2022 (2).pdf
The State of Automatic Speech Recognition 2022 (2).pdf
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Visual basics
Visual basicsVisual basics
Visual basics
 
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
Iconic Translation: The Neural Frontier by John Tinsley (Iconic Translation M...
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon Comprehend
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 
Machine Translation: The Neural Frontier
Machine Translation: The Neural FrontierMachine Translation: The Neural Frontier
Machine Translation: The Neural Frontier
 

More from Jisc

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...Jisc
 
Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxJisc
 
Open Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxOpen Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxJisc
 
Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Jisc
 
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...Jisc
 
Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc
 
Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc
 
Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc
 
JISC Presentation.pptx
JISC Presentation.pptxJISC Presentation.pptx
JISC Presentation.pptxJisc
 
Community-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxJisc
 
The Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxThe Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxJisc
 
Are we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxJisc
 
JiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJisc
 
UWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxUWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxJisc
 
An introduction to Cyber Essentials
An introduction to Cyber EssentialsAn introduction to Cyber Essentials
An introduction to Cyber EssentialsJisc
 
MarkChilds.pptx
MarkChilds.pptxMarkChilds.pptx
MarkChilds.pptxJisc
 
RStrachanOct23.pptx
RStrachanOct23.pptxRStrachanOct23.pptx
RStrachanOct23.pptxJisc
 
ISDX2 Oct 2023 .pptx
ISDX2 Oct 2023 .pptxISDX2 Oct 2023 .pptx
ISDX2 Oct 2023 .pptxJisc
 
FerrellWalker.pptx
FerrellWalker.pptxFerrellWalker.pptx
FerrellWalker.pptxJisc
 

More from Jisc (20)

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...International students’ digital experience: understanding and mitigating the ...
International students’ digital experience: understanding and mitigating the ...
 
Digital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptxDigital Storytelling Community Launch!.pptx
Digital Storytelling Community Launch!.pptx
 
Open Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptxOpen Access book publishing understanding your options (1).pptx
Open Access book publishing understanding your options (1).pptx
 
Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...Scottish Universities Press supporting authors with requirements for open acc...
Scottish Universities Press supporting authors with requirements for open acc...
 
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...How Bloomsbury is supporting authors with UKRI long-form open access requirem...
How Bloomsbury is supporting authors with UKRI long-form open access requirem...
 
Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023Jisc Northern Ireland Strategy Forum 2023
Jisc Northern Ireland Strategy Forum 2023
 
Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023Jisc Scotland Strategy Forum 2023
Jisc Scotland Strategy Forum 2023
 
Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023Jisc stakeholder strategic update 2023
Jisc stakeholder strategic update 2023
 
JISC Presentation.pptx
JISC Presentation.pptxJISC Presentation.pptx
JISC Presentation.pptx
 
Community-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptxCommunity-led Open Access Publishing webinar.pptx
Community-led Open Access Publishing webinar.pptx
 
The Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptxThe Open Access Community Framework (OACF) 2023 (1).pptx
The Open Access Community Framework (OACF) 2023 (1).pptx
 
Are we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptx
 
JiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptxJiscOAWeek_LAIR_slides_October2023.pptx
JiscOAWeek_LAIR_slides_October2023.pptx
 
UWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptxUWP OA Week Presentation (1).pptx
UWP OA Week Presentation (1).pptx
 
An introduction to Cyber Essentials
An introduction to Cyber EssentialsAn introduction to Cyber Essentials
An introduction to Cyber Essentials
 
MarkChilds.pptx
MarkChilds.pptxMarkChilds.pptx
MarkChilds.pptx
 
RStrachanOct23.pptx
RStrachanOct23.pptxRStrachanOct23.pptx
RStrachanOct23.pptx
 
ISDX2 Oct 2023 .pptx
ISDX2 Oct 2023 .pptxISDX2 Oct 2023 .pptx
ISDX2 Oct 2023 .pptx
 
FerrellWalker.pptx
FerrellWalker.pptxFerrellWalker.pptx
FerrellWalker.pptx
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 

Speech recognition - Art of the possible

  • 1. Speech recognition: Art of the possible Dominik.Lukes@ctl.ox.ac.uk @techczech
  • 2. Dominik’s journey Computational linguistics Cognitive linguistics Language teaching 1990–1995 Language teacher training Translation Metaphor / discourse studies 1995–2008 Readability Learning / Assistive technology Dyslexia teacher training 2009 – present
  • 3. Bill Gates in 2011 “The next big thing is definitely speech and voice recognition.”
  • 4. What do we want to know? What is the current state of the art? How we got here? Where are going?
  • 5. Are we asking the right questions?
  • 6. Tasks for speech recognition by difficulty Select word from list Interpret command Type dictation Transcribe presentation Transcribe conversation
  • 7. How we think of it vs how it is Select word from list Interpret command Type dictation Transcribe presentation Transcribe conversation Transcribe conversation Transcribe presentation Type dictation Interpret command Select word from list
  • 8. Speech recognition approximate timeline Select digit 1950s Select from 1000 words 1970s Select from large vocabulary 1980s Dictate word by word 1990s Dictate whole sentences 1997 Transcribe YouTube video 2012 Transcribe conversation 2019
  • 9. What is the actual job of speech recognition?
  • 10. What is this word? [pʰɹɛtsɫ̩] [pɹɛtsl] /pretsəl/ <pretzel>
  • 11. What’s the problem aspirated /p/ at start of a stressed syllable devoiced /r/ following /p/ labialised /r/ following /p/ dark /l/ syllabic consonant glottal stop
  • 12. It gets worse: find the missing sounds
  • 13. Course on speech recognition 1993 Faster computers won’t help improve speech recognition. We need a new approach.
  • 14. Dragon Naturally Speaking released in 1997. Can recognise whole sentences. What happened?
  • 15. How speech recognition does not work? Finding individual sounds (phonemes) in the speech and matching them to letters.
  • 16. How speech recognition actually works? P(W|C) What is the likelihood that the next word is X given what came before?
  • 17. Actually, it is quite a bit more complicated (Huang and Deng 2009)
  • 18. Probabilistic (stochastic) ASR enabled the change. Linguistics took the back seat.
  • 19. Fred Jelinek (ASR Pioneer - 1988?) "Every time I fire a linguist, the performance of the speech recognizer goes up"
  • 20. Consequence of probabilistic approach: Worse on words not predictable from context Names Acronyms Specialist Terms
  • 21. Question in 2011 I recorded a lecture, can I use Dragon to transcribe it?
  • 22. “Caption fails” in 2014 provided source for comedy
  • 23. YouTube Captions today are usable and useful
  • 24. So what happened between 2014 and 2022?
  • 25. Ingredients of success Larger data sets More computing power Neural networks
  • 26. Patrick Winston (2015) MIT Lecture 12a in AI course It was in 2010, yes, that's right. It was in 2010. We were having our annual discussion about what we would dump from 6034 in order to make room for some other stuff. And we almost killed off neural nets. That might seem strange because our heads are stuffed with neurons. … But many of us felt that the neural models of the day weren't much in the way of faithful models of what actually goes on inside our heads. And besides that, nobody had ever made a neural net that was worth a darn for doing anything.
  • 27. 2012 – ImageNet showed that Neural Networks are much better at computing the probabilities for complex data.
  • 28. Ok, we have neural nets, what does that mean?
  • 29. Things to know about Neural Nets Everything has a probability Same input does not produce same output They have no ‘sanity check’ or ‘common sense’
  • 30. What do probabilities look like?
  • 31. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models Allyson Ettinger 2019
  • 33. Output changes as more information is made available. (Not always for the better)
  • 34. Examples from today’s captions Crystal > Chris is Am > and experts > experience AR > a our
  • 35. Different ways of transcribing Dua Lipa alipa dualipa dua lipa lipa duda lipa
  • 36. Rise and mostly fall of Google’s new spell Czech
  • 37. Tracking faces at the tips of the shoes
  • 38. Hallucination is a big problem
  • 39. Question asked by faculty member in 2021 We correct the transcripts, why doesn’t the system learn the correct spelling?
  • 40. Adding your own word list just tweaks the probabilities.
  • 41. Setting a genre setting tweaks the probabilities.
  • 42. Another thing to know about NN Neural Nets use very large data sets and can take days or weeks to train.
  • 43. Consequences of NN size Speech recognition is often not done on device. Individual input often cannot adjust the quality (except in pre-training) Most applications use APIs from the big players Few open source/free options
  • 44. Big players in the field Google Microsoft (now also Nuance) Amazon
  • 46. Interesting applications Descript Microsoft Reading Progress Microsoft Presentation Coach
  • 47. What can we expect in the future
  • 49.
  • 50. The Original Roomba (2002) vs Roomba S9+ (2019) - Wow!
  • 51. What happens in speeches Fillers Repetition
  • 52. What does conversation actually look like?
  • 53. Possible futures? Incremental improvement similar to Roomba in 17 years Accurate lecture transcripts Fluent dictation with pauses Better meeting transcription Revolutionary change similar to change in speech recognition in 6 years Informal conversation transcription Interactive dictation Multilingual speech transcription
  • 54. How should we think about accuracy? We speak 120-180 words per minute 99% accurate = 2 errors per minute
  • 55. From Sept 2014 xkcd.com/1425 Sometimes it is hard to judge how much effort will be needed to solve a seemingly easy problem.
  • 56. Wishlist (a few hours of coding) Transcripts indicate level of confidence Benchmarks for lecture transcripts Better manual control of transcripts (like Descript)
  • 57. Dreamlist (5 years and a research team) Multilingual transcription (identify change in language) Multimodal transcription (use information from video) Raw to readable transcript
  • 59. Kate Knill Machine Intelligence Lab, University of Cambridge Richard Cave MND Association (and formerly Google project Euphonia) Richard Purcell Caption.Ed Irit Opher Head of Research at Verbit.ai
  • 60. What is the current state of the art of speech recognition in general and in the transcription of recorded speech in particular? What are the current quality metrics and how much do they tell us about suitability of models? Do we need better ones? After the big recent jump in performance, are we seeing a plateau with incremental growth or can we expect another step change in quality? Where can we see the most innovation? What are the research and development blind spots where more effort is needed? What are the currently unsolved problems for which we do not have a solution? What is the space for smaller players to innovate in this space? How much do they have to rely on pre- trained models from big providers? Is there space for open source?
  • 61. This presentation is licensed under Creative Commons By Attribution license except where otherwise noted. Icons and stock images from Microsoft Office 365 creative premium. They cannot be distributed separately from this document.