Although there are many ways to optimize the speech generated by Amazon Polly's text-to-speech voices, new customers may find it challenging to quickly learn how to apply the most effective enhancements in each situation.
The objective of this webinar is to educate customers about all of the ways in which they can modify the speech output, and to learn some insider tips to help them get the most out of the Polly service.
This webinar will provide a comprehensive overview of the available tools and techniques available for modifying Polly speech output, including SSML tags, lexicons, and punctuation.
Other topics will include recommendations for streamlining the process of applying these techniques, and how to provide feedback that the Polly team can use to continually improve the quality of voices for you. Build a simple speech-enabled app with Polly's text-to-speech voices.
Learning Objectives:
· Learn about the complete set of available SSML tags, and how you can apply them in order to modify and enhance your speech output.
· Learn how you can override the default Polly pronunciation for specific words, by creating a lexicon of these words, along with the pronunciation that matches your needs.
· Learn about how you can use punctuation to modify the way text is spoken by Polly voices.
· Get insider tips on the best speech optimization techniques to apply to each of the most common speech production concerns.
· Discover ways to streamline the process of getting the most out of Polly voices through SSML tags and lexicons.
· Find out the best way to submit your feedback on Polly voices, pronunciation, and the available feature set, so that we can continue to improve this service for you!
2. What to Expect from the Session
• What is Polly?
• Example app
• Using punctuation and SSML
• Using external Lexicons
• Q&A
3. • A service that converts text into lifelike speech
• 47 voices, 24 languages
• Developers can store, replay and distribute
generated speech
What is Polly?
4. The Polly console
I bought 2lbs of meat
and 16oz of potatoes
Justin (US)
Amy (UK)
Raveena (IN)
7. Goal: Convert text into intelligible, accurate, and natural
speech
• G2P: rough, though, through.
• Homographs: same spelling, different pronunciations.
I live in Poland
This presentation is broadcasted live from Poland
Context helps 'live' disambiguation. But...
I read this book.
Main Challenges for Text-to-Speech
8. • Text normalization: disambiguation of abbreviations, acronyms,
units ‘St.’ expanded as ‘street’ or ‘saint’
<speak>St. Patrick St.</speak>
• Foreign words (déjà vu), proper names (François Hollande),
social media lingo (ASAP, LOL) etc.
Main challenges for Text-to-Speech
9. Speech Synthesis Markup Language (SSML)
• W3C recommendation, XML-based markup language for speech
synthesis applications. AWS Polly tags are compliant with SSML 1.1
specifications.
• Allows customers to modify certain aspects of the TTS speech output, for
example pronunciation of words, expansion of abbreviation, acronyms, etc.,
as well as pitch, rate of speech, volume, etc.
SSML in Polly
10. All SSML documents must start with an opening <speak> tag and end with a
closing </speak> tag. All other tags are inserted between <speak></speak>
SSML document structure
13. The <sub> tag
In-line aliasing
In many cases we do not want to change all instances of a certain word.
<speak>
My favorite chemical element is <sub
alias="aluminum">Al</sub>,but Al prefers <sub
alias="magnesium">Mg</sub>.
</speak>
14. The <phoneme> tag
Force pronunciation in-line
Read: present or past?
I <phoneme alphabet = "x-sampa"
ph='"rid'>read</phoneme> a
book.
I <phoneme alphabet = "x-sampa"
ph='"rEd'>read</phoneme> a
book.
Examples of EN phonemes
http://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html
IPA X-SAMPA Example
ɹ r red
ɛ E dress
i i fleece
d d dig
16. Alias (e.g. abbreviation expansion)
Follows the Pronunciation Lexicon Specifications (PLS)
<lexeme><grapheme>Ne</grapheme><alias>Neon</alias></lexeme>
<lexeme><grapheme>Na</grapheme><alias>Sodium</alias></lexeme>
<lexeme><grapheme>Mg</grapheme><alias>Magnesium</alias></lexeme>
<lexeme><grapheme>Al</grapheme><alias>Aluminum</alias></lexeme>
<lexeme><grapheme>Si</grapheme><alias>Silicon</alias></lexeme>
<speak>Mg and Al are chemical elements</speak>
Lexicons: <alias>
17. Assign custom pronunciation (IPA or X-Sampa alphabets)
Settling the 'gif' issue once and for all.
<lexeme><grapheme>gif</grapheme><phoneme>"dZIf</phoneme></lexeme>
<lexeme><grapheme>David</grapheme><phoneme>"dA.%vid</phoneme>
</lexeme>
<speak>I like this gif.</speak>
<speak>Here's my friend David.</speak>
Lexicons: <phoneme>
19. The <lang> tag
Foreign words and phrases
Foreign phrases are rendered better if they are enclosed inside the <lang> tag,
as in the following example.
French in English
<speak>
J'adore chanter.
</speak>
<speak>
<lang xml:lang="fr-FR">J'adore chanter</lang>.
</speak>
20. The <lang> tag
English in Italian
The pronunciation of English is like that of a non-bilingual Italian speaker.
<speak>
Mi piace Bruce Springsteen.
</speak>
<speak>
Mi piace <lang xml:lang="en-US">Bruce Springsteen.</lang>
</speak>
21. The <lang> tag
Multiple languages
All languages supported by AWS Polly can be invoked by the lang tag.
EN FR IT ES PL
<speak>Onion, onion, cipolla, cebolla, cebula.</speak>
<speak>Onion, <lang xml:lang="fr-FR">onion</lang>, <lang
xml:lang="it-IT">cipolla</lang>, <lang xml:lang="es-
ES">cebolla</lang>, <lang xml:lang="pl-PL">cebula</lang>.</speak>
23. The <say-as> tag
• The TTS engine works well for most common and unambiguous text
structures, such as dates, time, etc..
• Possible to force interpretation through the <say-as> tag in
ambiguous cases. (phone number, addresses, etc.)
Phone numbers (interpret-as="telephone")
<speak>(514) 888-5195
<say-as interpret-as="telephone">(514) 888-5195</say-as>
</speak>
<speak>(514) 888-5195x123 </speak>
<speak><say-as interpret-as="telephone">(514) 888-5195x123</say-
as></speak>
24. The <say-as> tag
Phone numbers (US vs. UK): different pronunciation styles.
US
Richard's number is <prosody rate='slow'> <say-as interpret-
as='telephone'>(212) 224-1555</say-as> </prosody>
UK
Richard's number is <prosody rate='slow'> <say-as interpret-
as='telephone'>(212) 224-1555</say-as></prosody>
25. <say-as interpret-as="expletive">
Bleeping undesirable content
<speak>
Your next song is "Killing in the name of" by Rage Against
the Machine.
</speak>
<speak>
Your next song is "<say-as interpret-
as="expletive">Killing</say-as> in the name of" by Rage
Against the Machine.
</speak>
26. <say-as interpret-as="spell-out">
Read character by character
<speak>And here is how you spell handkerchief: <prosody
rate="x-slow"><say-as interpret-as="spell-
out">handkerchief</say-as></prosody>.</speak>
28. The power of commas / periods
Adding punctuation helps getting better prosody
<speak>He went to Harvard and when he decided to drop out it was
not to find enlightenment with an Indian guru but to start a
computer software company.</speak>
<speak>He went to Harvard, and when he decided to drop out, it
was not to find enlightenment with an Indian guru, but to start a
computer software company.</speak>
29. The <prosody> tag
The <prosody> tag allows some changes to how speech is
delivered, through the following supported attributes
• volume
• rate
• pitch
30. The volume attribute
Modify the volume of speech
<speak>
I can speak normally, <prosody volume="x-loud"> or I can speak
louder</prosody>.
</speak>
<speak>
I can speak normally, <prosody volume="x-soft"> or I can speak
quieter</prosody>.
</speak>
31. The rate attribute
Change the speed of speech
<speak>
When I wake up, <prosody rate="x-slow">I speak quite
slowly</prosody>.
</speak>
<speak>
When I am in a hurry, <prosody rate="x-fast">I speak very
fast</prosody>.
</speak>
32. The pitch attribute
Modify the pitch of a word/phrase
<speak>
When I get angry, <prosody pitch="x-high">my pitch goes way
up</prosody>
</speak>
<speak>
When I get sad, <prosody pitch="x-low">my pitch goes way
down</prosody>
</speak>
33. The pitch attribute
Modify the pitch of a word/phrase
<speak>
I can go normal, <prosody pitch="high">high</prosody>,<prosody
pitch="x-high">higher</prosody>,<prosody
pitch="low">low</prosody>, and <prosody pitch="x-
low">lower</prosody>.
</speak>
34. Use pitch to improve intonation
Adding punctuation and modifying pitch helps getting better
prosody
Do you like this or that?
Do you like <prosody pitch="+5%"> this </prosody>, or <prosody
pitch="-2%">that?</prosody>
35. Punctuation and the <break> tag
Add a pause anywhere (time, strength attributes)
And the winner is <break time='5s'/> Bob Dylan!
And the winner is <break strength="x-strong" /> Bob Dylan!
37. Fun with SSML
'Can you make your voices sound like an auctioneer?'
<speak><prosody rate='+60%'>I’m at 500 and I want
550<prosody volume='x-loud'>550</prosody></prosody>
<prosody rate='+60%'>bid on 550 I’m at 500 would you go
550 550 for the gentleman in the corner</prosody> <prosody
rate="+90%">A big black bug bit a big black bear a big
black bug bit a big black bear</prosody> Do we get 600?
<prosody rate='+90%'>A big black bug bit a big black
bear</prosody><prosody rate='+60%'>We got 600 for the
whole herd</prosody><prosody rate='default' volume='x-
loud'>Sold <prosody rate='+60%'>for
600.</prosody></prosody></speak>
38. Fun with SSML
'It's good, but can you make her sound like she's from
Boston???'
If your car’s blinkers are broken, it may be the blinker
relay. Fortunately, this car fix is easy to do.
<speak>If <phoneme ph='"jO: "kAz "blIN.k@z'>your car's
blinkers</phoneme> <phoneme ph='%A'>are</phoneme> broken,
it may be the <phoneme ph='"blIN.k@'>blinker</phoneme>
relay. <phoneme ph='"fO.tS@n.@t.li'>Fortunately</phoneme>,
this <phoneme ph='"kA'>car</phoneme> fix is easy to do.
</speak>
39. • Contact us with any question about this webinar or Polly in general
polly-webinars-feedback@amazon.com
• SSML documentation
http://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html
• Introducing Amazon Polly at re:Ivent 2016
https://www.youtube.com/watch?v=zjMqimHis3U&t=2s
• PLS 1.0 Specifications
https://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/
Next AWS Polly webinar (Apr 10th): "How to integrate Amazon Polly
voices seamlessly into your application workflow"