Presentation introducing a panel discussion on the present and future of speech recognition for lecture capture at Digifest 2022 online fringe on Assistive Technologies: https://www.jisc.ac.uk/events/focus-on-the-future-new-developments-in-accessible-and-assistive-technologies-16-mar-2022
6. Tasks for speech recognition by difficulty
Select
word
from list
Interpret
command
Type
dictation
Transcribe
presentation
Transcribe
conversation
7. How we think of it vs how it is
Select word from list
Interpret
command Type
dictation
Transcribe
presentation
Transcribe
conversation
Transcribe
conversation
Transcribe
presentation
Type dictation
Interpret
command
Select
word from
list
8. Speech recognition approximate timeline
Select digit
1950s
Select from 1000
words
1970s
Select from large
vocabulary
1980s
Dictate word by
word
1990s
Dictate whole
sentences
1997
Transcribe
YouTube video
2012
Transcribe
conversation
2019
10. What is this word?
[pʰɹɛtsɫ̩]
[pɹɛtsl]
/pretsəl/
<pretzel>
11. What’s the problem
aspirated /p/ at
start of a stressed syllable
devoiced /r/ following /p/
labialised /r/
following /p/ dark /l/
syllabic
consonant
glottal
stop
26. Patrick Winston (2015) MIT Lecture 12a in AI course
It was in 2010, yes, that's right. It was in 2010. We
were having our annual discussion about what we
would dump from 6034 in order to make room for
some other stuff. And we almost killed off neural
nets. That might seem strange because our heads
are stuffed with neurons. … But many of us felt that
the neural models of the day weren't much in
the way of faithful models of what actually goes
on inside our heads. And besides that, nobody
had ever made a neural net that was worth a
darn for doing anything.
27. 2012 – ImageNet showed
that Neural Networks are
much better at computing
the probabilities for
complex data.
29. Things to know about Neural Nets
Everything has a probability
Same input does not produce
same output
They have no ‘sanity check’
or ‘common sense’
42. Another thing to know about NN
Neural Nets use very large data
sets and can take days or
weeks to train.
43. Consequences of NN size
Speech recognition is often not
done on device.
Individual input often cannot adjust
the quality (except in pre-training)
Most applications use APIs from the
big players
Few open source/free options
44. Big players in the field
Google
Microsoft (now also Nuance)
Amazon
53. Possible futures?
Incremental
improvement
similar to Roomba in 17 years
Accurate
lecture
transcripts
Fluent
dictation with
pauses
Better meeting
transcription
Revolutionary
change
similar to change in speech
recognition in 6 years
Informal
conversation
transcription
Interactive
dictation
Multilingual
speech
transcription
54. How should we think about accuracy?
We speak 120-180 words per minute
99% accurate = 2 errors per minute
55. From Sept 2014 xkcd.com/1425
Sometimes it is hard to judge
how much effort will be needed
to solve a seemingly easy
problem.
56. Wishlist (a few hours of coding)
Transcripts indicate level
of confidence
Benchmarks for lecture
transcripts
Better manual control of
transcripts (like Descript)
57. Dreamlist (5 years and a research team)
Multilingual transcription
(identify change in
language)
Multimodal transcription
(use information from
video)
Raw to readable
transcript
59. Kate Knill
Machine Intelligence
Lab, University of
Cambridge
Richard Cave
MND Association (and
formerly Google
project Euphonia)
Richard
Purcell
Caption.Ed
Irit Opher
Head of Research at
Verbit.ai
60. What is the current state of
the art of speech recognition
in general and in the
transcription of recorded
speech in particular?
What are the current quality
metrics and how much do
they tell us about suitability
of models? Do we need
better ones?
After the big recent jump in
performance, are we seeing
a plateau with incremental
growth or can we expect
another step change in
quality?
Where can we see the most
innovation? What are the
research and development
blind spots where more effort
is needed?
What are the currently
unsolved problems for
which we do not have a
solution?
What is the space for
smaller players to innovate
in this space? How much do
they have to rely on pre-
trained models from big
providers? Is there space for
open source?
61. This presentation is licensed
under Creative Commons By
Attribution license except where
otherwise noted.
Icons and stock images from Microsoft
Office 365 creative premium. They
cannot be distributed separately from this
document.
Editor's Notes
Bill Gates big on digital reading, voice recognition, ubiquitous screens – GeekWire https://www.geekwire.com/2011/bill-gates-big-digital-reading-voice-recognition-ubiquitous-screens/
Microsoft's Bill Gates: A rare and remarkable interview with the world's second richest man | Daily Mail Online https://www.dailymail.co.uk/home/moslive/article-2001697/Microsofts-Bill-Gates-A-rare-remarkable-interview-worlds-second-richest-man.html
This Photo by Unknown Author is licensed under CC BY
Language Log » First novels (upenn.edu) https://languagelog.ldc.upenn.edu/nll/?p=53940&utm_source=rss&utm_medium=rss&utm_campaign=first-novels
An Overview of Modern Speech Recognition - Microsoft Research https://www.microsoft.com/en-us/research/publication/an-overview-of-modern-speech-recognition/
https://www.youtube.com/watch?v=uXt8qF2Zzfo
Transcribed from YouTube
http://cs231n.github.io/convolutional-networks
Impressive on English, falls down on Czech
Vanden Stock was a Belgian football player
https://ai.googleblog.com/2021/01/totto-controlled-table-to-text.html
Language Log » How AI Reporting Works (upenn.edu)
https://www.youtube.com/watch?v=YLWSXVS71Js
Transcribing Talk-in-Interaction - SAGE Research Methods (sagepub.com) https://methods.sagepub.com/book/doing-conversation-analysis/n6.xml
Comic by XKCD licensed under CC BY NC
What am I allowed to use premium creative content for? (microsoft.com) https://support.microsoft.com/en-us/topic/what-am-i-allowed-to-use-premium-creative-content-for-0de69c76-ff2b-473e-b715-4d245e39e895
Creative Commons — Attribution 4.0 International — CC BY 4.0
https://creativecommons.org/licenses/by/4.0/