A lightning talk presentation from Jisc's Focus on the future: new developments in accessible and assistive technologies event held on 16 March 2022 as part of Digifest community fringe.
2. Agenda
1. About us
2. How Voiceitt works
3. Why speech recognition; why now?
4. User testing & tech development
5. Impact on participants
6. Project Ensemble
7. Q&A
9. Why this technology;
why now?
• Digitisation of society
• Technology creating/removing barriers
• Machine learning + large data sets
• Growing political and commercial interest
• Legislation re: digital accessibility; access to education,
employment, public services
• Increasing rates of disability
• ND diagnoses, medical advances, ageing population
• Inclusive digital practices
12. User Feedback Tech Development
Training new phrases
• Difficult, time-consuming
• Hard to maintain focus
• ~40 reduced to ~10
• Gamification, rewards (optional)
• Voiceitt Ensemble (in progress)
App navigation
• Physical limitations
to operating iPad/iPhone
• Difficulty
remembering steps
• Hands-free access via head
movements/switch added
• ‘Speak’ added to all dictionary
screens
• Wake word for conversation
13. User Feedback Tech Development
Operating systems &
hardware compatibility
• Only iOS
• No calling function
• For SmartHome-
multiple apps/accounts
a barrier .
Recognition issues
• User & audience;
background noise
• Android version in progress
• Support for voice/video calls in
progress
• Discussions with manufacturers
• Ongoing work; voice isolation
15. Impact on participants 2
• Improved communication
• Increased independence
• Training
• consistency of pronunciation
• planning reducing anxiety
Limited by:
• pre-training phrases
• noisy situations
16. Project Ensemble
• Atypical speech sampling
• Supporting development towards continuous recognition
• New participants welcome!
17. Get involved
Recruiting for testing Voiceitt and Project Ensemble
liz@karten-network.org.uk
geena@karten-network.org.uk
nuvoic.karten-network.org.uk
@KartenNetwork
voiceitt.com
Editor's Notes
Speech recognition for all; research, development and user experiences of the Voiceitt app
A presentation by Geena Vabulas with support from Liz Howarth, both from the Karten Network
Agenda
We’ll begin by introducing our organisation (the Karten Network), our partners Voiceitt, and the background to our project, then move on to talk a bit about how Voiceitt’s app currently works.
We’ll then look at the wider technical and social context for our project before describing some of the testing activities people have been taking part in, the feedback they’ve given, and how this has informed the ongoing development of the Voiceitt app.
We’ll then share some examples from participants of how Voiceitt has made communication easier, or allowed them to be more independent, before describing the next phase of the project: Voiceitt Ensemble.
We hope to have time afterwards for questions and discussion.
But before any of that, we want to start the session with some audience participation - by asking about people’s personal and professional experiences of using Speech Recognition technology.
At this point, the audience was asked to share experiences of successful use of speech recognition to support learning, and of any problems or barriers they’d found.
In summary, there are many potential uses of speech recognition, but these benefits aren’t accessible to everyone, and particularly not for people with atypical or impaired speech. Improving access for this group is the main focus of our work with Voiceitt.
The Karten Network is a network of specialist technology centres hosted in range of partner organisations, such as specialist colleges residential care and day services, with the common aim of supporting disabled people to be more independent through the use of technology.
Our partner, developers Voiceitt, have developed a speech recognition app designed for people with non-standard speech who might experience difficulties being understood by unfamiliar people or using standard speech recognition. The two main applications of Voiceitt are for support to communicate with unfamiliar people and to allow users to control voice-activated Smart Home devices like a smart speaker, smart lights, blinds or plugs, for example.
The aim of our Nuvoic project is to support Voiceitt’s development work towards accessible speech recognition. Our role is to recruit and support people to test and give feedback on the Voiceitt app, to help improve the interface and functionality. We’re also now starting to recruit people to donate voice samples though Voiceitt Ensemble to support the next phase of their development: to make the underlying recognition more flexible and rely less on pre-training. (More details to follow).
This video shows some of our participants using Voiceitt, both for communication and for SH controls, and hopefully gives you a flavour of some of the different ways the app can be used.
The way Voiceitt currently works uses discrete recognition, meaning that users have to train the app in advance to recognize each individual phrase or command they want to use. Each time the user wants to use a new phrase or command they first have to add it to their dictionary. The user chooses what they want to say as their prompt (short prompts of just 1-2 words can be helpful for people who find speaking tiring), then setup whatever they want to be the corresponding output. The output phrase can be either played out loud using synthesized speech, or sent silently to the Alexa app on their device. As an example, in the first screenshot, the user sets the phrase ‘I would like toast please’ to play aloud when they say the prompt ‘toast’.
The user then trains Voiceitt to recognise their chosen prompt by repeating it around 10 to 15 times. Once the app has enough similar recordings, the phrase is added to their personal speech model, unlocked in the dictionary and made available for use.
The second screenshot shows the Speak tab, where the user can trigger Voiceitt to start listening by pushing a button on the app or by saying a wake word. They then say their chosen prompt and, when recognised, this generates the output they specified.
This clip shows one of our participants training a phrase in Voiceitt using the prompt ‘panini’, and then testing it out.
Then in this clip you can see him using another phrase he trained to help with his work experience, collecting orders for the cafeteria. He uses the prompt ‘Thursday, café’ to ask what someone wants for lunch that day.
It's useful to understand why we're focusing on speech recognition technology and nonstandard voices in 2022.
In general, our society is becoming increasingly digitised. While this represents many exciting opportunities to break down barriers, there are also very real concerns about new technologies creating barriers. This digitization in general, and advances in speech recognition technologies specifically, are being driven by the power of machine learning algorithms and massive data sets. As Google and Microsoft and Apple collect voice data from billions of users around the world, this data is fed into algorithms which are then optimised for those voices. This has led to huge leaps in the quality of speech recognition technology and the mainstreaming of its usage in mobiles, smart home technologies, and standard software packages.
Unfortunately, these algorithms work better for some voices than for others. They cannot be optimised for non standard or dysarthric voices because they aren't built using data sets of these minority voices. This is exactly the gap that Voiceitt and Project Ensemble are trying to fill.
The good news is there is growing political and commercial interest in assistive technologies and technology that removes barriers for disabled people in general. For example, many of you may be familiar with the recent public sector web accessibility regulations which are all about guaranteeing access to public services, including education, for all. Last year the government published its National Disability Strategy and it was fantastic to see how much assistive technologies and digital inclusion featured throughout. It shows that the government recognises how important digital inclusion is to all aspects of our lives. The strategy includes a commitment to explore the case for building a world-leading national centre for assistive and accessibility technology.
Legislation is an important mechanism for equality, but there's also growing commercial interest, in part because there are increasing rates of disability around the world. There are all sorts of reasons why disability rates are increasing. For example, with neurodivergent conditions such as autism, dyslexia, ADHD, and dyspraxia, growing awareness of these conditions around the world and improving how we diagnose women and girls and people of colour has led to a boom in diagnosis. Medical advances mean more disabled children are living longer and attending school and entering the workforce.
Finally, the ageing population of the world and particularly in the UK means that people are living longer and acquiring more impairments. As we age we are all likely to start to struggle with things like our dexterity, vision, hearing, and mobility. The ageing population means that more people will be interested in buying products to remove barriers to these issues and it also means that people are staying in employment longer. As we push back retirement ages, we need to ensure that people are able to access work, and digital tools are a key enabler here.
It's not just about money and legislation, however. We've increasingly seen organisations and technology developers understanding the value of inclusive practises wherein ensuring accessibility for a specific group of people actually benefits all of us. For example, we've seen how captions are hugely beneficial for all sorts of people, such as those working with loud children, those with auditory processing disorders, and those with English as an additional language, not just for D/deaf people. Inclusive practices mean we don't have to make people come forward to request "special treatment", we can just make the world accessible for everyone.
We’ve recruited over 60 participants from all over the UK and Ireland. Most have joined us through organisations such as specialist colleges, residential care and day services, so we’ve worked with them through their local support teams, and have also worked directly with a few individual participants.
Due to Covid, we adapted our original plans to offer face-to-face support for setup and initial training on how to use the app and were able to provide this remotely, as well as ongoing technical support for troubleshooting and setting up new smart home commands.
Collecting feedback from participants about their views and experiences of using the app has been a key focus of our work. We wanted to find out about the different ways people chose to use Voiceitt, how they found the user interface and functionality, what worked well and any problems or ideas for what could be improved. We shared this feedback with Voiceitt to help them develop and improve the app to better meet users’ needs.
One method of collecting feedback was through feedback interviews after 3 months or more, which was a useful opportunity to ask everyone the same questions at roughly the same point in their experience.
We also kept in touch with participants throughout their testing to ask about how things were going and collect much more informal feedback on any new ways of using the app any problems or successes. This worked well as a way of hearing about problems or shortcomings, allowing people to raise issues while they were still fresh in their mind, and allowing us to share ‘real-time’ feedback with developers. Through this method, we were sometimes able to get back to people with a response from developers within days or weeks, allowing them to see the impact of their comments.
We wanted to give some examples of how user feedback have influenced development of the app.
When the first participants joined, it typically took around 30 to 40 repetitions to get new phrase unlocked, and lots of people in the early days told us that the process of training app to recognise new phrases was difficult, time-consuming and boring. So Voiceitt found a way to reduce the number of repetitions needed to around 10 to 15, and added animations, XP and achievements to try to make it more fun to use. Lots of users really liked these features, but not everyone, so a few months later these were made optional.
Lots of people told us they would like to eliminate the need to train each phrase separately as this is very restrictive in terms of functionality, especially for use in conversation, as you need to have pre-planned everything you want to say in advance. The next phase in Voiceitt’s development is to work towards continuous recognition of spontaneous atypical speech, without the need to train every phrase in advance. Others are working on the same problem including the Google Euphonia and Relate teams, so hopefully someone will solve this problem soon! The first stage of this work is to collect lots more non-standard speech data, so Voiceitt are now recruiting people to donate recordings through their Ensemble website (more details to follow).
Lots of people told us that they found it hard to navigate through the app. For some, this is due to issues with accessing the touchscreen interface. Others found it difficult to remember how to get from one part to another. In response, Voiceitt have enabled switch control through the Apple accessibility settings, added quick access to the speak tab from every screen, and recently added the option to use a wake word in Conversation mode, to allow hands-free use as in Smart Home mode.
Some other popular requests we hear are that
People would like the option of using Voiceitt on Android devices, so an Android version is currently under development
People would like to use Voiceitt to make voice or video calls via Alexa – another new feature they’re working on
People find it difficult to manage the multiple accounts needed to give control of different brands of Smart Home equipment, so Voiceitt are in discussions with manufacturers to address this.
Most common issue in terms of performance is that recognition still isn’t reliable enough, especially when there’s background noise, so this is a big priority for Voiceitt. We’re supporting ongoing testing with the new voice isolation function available in iOS to explore whether this helps.
Bugs involving the use of predictive text, incorrect display in landscape mode, recognition of newly trained phrases and delayed response to Smart Home commands have all been resolved via updates.
As well as the impact participants have had on the development of the app, we also wanted to give some examples of how using Voiceitt has helped participants to communicate more easily, or be more independent through the use of Smart Home controls. Since lockdown restrictions have lifted, a few people have tried using Voiceitt out and about to order a drink in a café or buy stamps or buy a bus ticket – using their own voice rather than relying on someone else to speak for them. Music and radio controls are also popular, and some participants told us that it’s made a big difference to them to be able to choose what they want to listen to and control it themselves, rather than asking someone to do it for them.
In this video one of our participants, Karl, talks about his experience: how he found the setup process and how Voiceitt has enabled him to use his own voice to control various Smart Home devices, some ideas for how Voiceitt could be better, and how he feels his feedback has contributed so far.
Transcript:
[Interviewer, Sean] Hello Karl, how are you?
[Karl] I'm good Sean. How are you?
[Sean] I'm good, thank you.
[Sean] Would you like to introduce yourself?
[Karl]My Name is Karl Cretzan. I am from Waterford. I am 25 years old. I have quadriplegic cerebral palsy. I'm a disability activist and I am interested in AT and trying to improve AT for people with disabilities.
[Sean] Brilliant, and did you feel that Voiceitt was a product or App that would work for you on a personal level?
[Karl] First I didn't know if it would work for me, but when I started to learn about it and learn about the features, I thought that the smart home features would work best for me rather than the communication features on the App as speech recognition doesn't work great for me on normal
smart home devices.
[Sean] Okay, so you have a strong interest in smart home technologies Karl- is that fair to say?
[Karl] Yeah, I'm very interested in smart home technology and AT.
[Sean] Brilliant.
[Sean] And how did you find the experience of getting set up with Voiceitt initially?
[Karl] I thought it was difficult at the start as I didn't know what to do, but then when I got the hang of it, I found it easier, but it takes time to set it up and to train in the App- quite time consuming.
[Sean] And would say that Voiceitt has enabled you to be more independent in any way Karl?
[Karl] I do, because as I explained speech recognition such as 'Amazon Echo', 'Google Home’ doesn't understand me 100 percent and Voiceitt has made it so much easier for me to use these features.
[Sean] That's great Karl. So you have provided really in-depth feedback to the Voiceitt team through
ourselves at Karten, on the Nuvoic Project, really useful feedback regarding the features on the App, the layout and design of the App also, specifically on the Gamification of the App- how did you find that process?
[Karl] Well, I felt that the App could do with more features and easier access- user feedback is the most important thing to developers so users can use the App more independently and easier. I think Voiceitt have come a long way from the start and they have a long way to go yet.
So in summary, participants have given us lots of examples where Voiceitt has helped them to communicate more easily with unfamiliar people or in new situations or control their home environment more easily or more independently, albeit with some limitations due to the need to pre-train everything in advance, and ongoing issues around reliability of the recognition, especially in noisy environments
There have also been some unexpected benefits. A few people found that Alexa understands their voice without the need for Voiceitt, possibly partly due more consistent pronunciation resulting from the process of repeating a prompt to train the app. Others found that repeating prompts made their pronunciation easier for others to understand and so helped with communication without the need for Voiceitt. One participant found that the process of planning an interaction in advance, thinking about what he wanted to say and how the other person might respond, helped to reduce his anxiety about the conversation as well as helping him with the practicalities.
As we’ve mentioned, the biggest limitation for many people is that Voiceitt can currently only offer discrete recognition – meaning that every phrase has to be trained in advance. In the future Voiceitt hope to offer continuous recognition of spontaneous speech, with the first step to expand their training dataset of non-standard speech samples.
They’ve recently launched the Voiceitt Ensemble website, where people can donate speech recordings to try and help get this work off the ground, and we wanted to play one last video to give an idea of how this works.
Please do tell people about Ensemble if you know anyone who might like to be involved!
If you would like any more information, please get in touch via email to liz@karten-network.org.uk or geena@karten-network.org.uk;
Our project web page is at nuvoic.karten-network.org.uk;
We're on Twitter as @KartenNetwork
And Voiceitt's website is at voiceitt.com.
Thank you very much for your interest!