SlideShare a Scribd company logo
1 of 39
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Marco Nicolis, Remus Mois
Amazon Text-to-Speech
03/27/2017
How to get the most out of Polly
Leveraging lexicons and SSML
What to Expect from the Session
• What is Polly?
• Example app
• Using punctuation and SSML
• Using external Lexicons
• Q&A
• A service that converts text into lifelike speech
• 47 voices, 24 languages
• Developers can store, replay and distribute
generated speech
What is Polly?
The Polly console
I bought 2lbs of meat
and 16oz of potatoes
Justin (US)
Amy (UK)
Raveena (IN)
Amazon Text-to-Speech
Text-to-Speech Pipeline
Text
Text normalization
Grapheme-to-phoneme
conversion
Waveform
generation
Speech
She has $20 in her pocket.
she has twenty dollars in her pocket
ˈ ʃ i ˈ h æ z ˈ t w ɛ n . t i ˈ d ɑ . ɫ ə ɹ z ˈ ɪ n ˈ h ɝ ɹ ˈ p ɑ . k ə t
Goal: Convert text into intelligible, accurate, and natural
speech
• G2P: rough, though, through.
• Homographs: same spelling, different pronunciations.
I live in Poland
This presentation is broadcasted live from Poland
Context helps 'live' disambiguation. But...
I read this book.
Main Challenges for Text-to-Speech
• Text normalization: disambiguation of abbreviations, acronyms,
units ‘St.’ expanded as ‘street’ or ‘saint’
<speak>St. Patrick St.</speak>
• Foreign words (déjà vu), proper names (François Hollande),
social media lingo (ASAP, LOL) etc.
Main challenges for Text-to-Speech
Speech Synthesis Markup Language (SSML)
• W3C recommendation, XML-based markup language for speech
synthesis applications. AWS Polly tags are compliant with SSML 1.1
specifications.
• Allows customers to modify certain aspects of the TTS speech output, for
example pronunciation of words, expansion of abbreviation, acronyms, etc.,
as well as pitch, rate of speech, volume, etc.
SSML in Polly
All SSML documents must start with an opening <speak> tag and end with a
closing </speak> tag. All other tags are inserted between <speak></speak>
SSML document structure
Example app
Changing pronunciations in Polly
<sub>
<phoneme>
The <sub> tag
In-line aliasing
In many cases we do not want to change all instances of a certain word.
<speak>
My favorite chemical element is <sub
alias="aluminum">Al</sub>,but Al prefers <sub
alias="magnesium">Mg</sub>.
</speak>
The <phoneme> tag
Force pronunciation in-line
Read: present or past?
I <phoneme alphabet = "x-sampa"
ph='"rid'>read</phoneme> a
book.
I <phoneme alphabet = "x-sampa"
ph='"rEd'>read</phoneme> a
book.
Examples of EN phonemes
http://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html
IPA X-SAMPA Example
ɹ r red
ɛ E dress
i i fleece
d d dig
Using Lexicons in Polly
Alias (e.g. abbreviation expansion)
Follows the Pronunciation Lexicon Specifications (PLS)
<lexeme><grapheme>Ne</grapheme><alias>Neon</alias></lexeme>
<lexeme><grapheme>Na</grapheme><alias>Sodium</alias></lexeme>
<lexeme><grapheme>Mg</grapheme><alias>Magnesium</alias></lexeme>
<lexeme><grapheme>Al</grapheme><alias>Aluminum</alias></lexeme>
<lexeme><grapheme>Si</grapheme><alias>Silicon</alias></lexeme>
<speak>Mg and Al are chemical elements</speak>
Lexicons: <alias>
Assign custom pronunciation (IPA or X-Sampa alphabets)
Settling the 'gif' issue once and for all.
<lexeme><grapheme>gif</grapheme><phoneme>"dZIf</phoneme></lexeme>
<lexeme><grapheme>David</grapheme><phoneme>"dA.%vid</phoneme>
</lexeme>
<speak>I like this gif.</speak>
<speak>Here's my friend David.</speak>
Lexicons: <phoneme>
Handling foreign languages
The <lang> tag
Foreign words and phrases
Foreign phrases are rendered better if they are enclosed inside the <lang> tag,
as in the following example.
French in English
<speak>
J'adore chanter.
</speak>
<speak>
<lang xml:lang="fr-FR">J'adore chanter</lang>.
</speak>
The <lang> tag
English in Italian
The pronunciation of English is like that of a non-bilingual Italian speaker.
<speak>
Mi piace Bruce Springsteen.
</speak>
<speak>
Mi piace <lang xml:lang="en-US">Bruce Springsteen.</lang>
</speak>
The <lang> tag
Multiple languages
All languages supported by AWS Polly can be invoked by the lang tag.
EN FR IT ES PL
<speak>Onion, onion, cipolla, cebolla, cebula.</speak>
<speak>Onion, <lang xml:lang="fr-FR">onion</lang>, <lang
xml:lang="it-IT">cipolla</lang>, <lang xml:lang="es-
ES">cebolla</lang>, <lang xml:lang="pl-PL">cebula</lang>.</speak>
Define a specific interpretation
<say-as interpret-as="">
The <say-as> tag
• The TTS engine works well for most common and unambiguous text
structures, such as dates, time, etc..
• Possible to force interpretation through the <say-as> tag in
ambiguous cases. (phone number, addresses, etc.)
Phone numbers (interpret-as="telephone")
<speak>(514) 888-5195
<say-as interpret-as="telephone">(514) 888-5195</say-as>
</speak>
<speak>(514) 888-5195x123 </speak>
<speak><say-as interpret-as="telephone">(514) 888-5195x123</say-
as></speak>
The <say-as> tag
Phone numbers (US vs. UK): different pronunciation styles.
US
Richard's number is <prosody rate='slow'> <say-as interpret-
as='telephone'>(212) 224-1555</say-as> </prosody>
UK
Richard's number is <prosody rate='slow'> <say-as interpret-
as='telephone'>(212) 224-1555</say-as></prosody>
<say-as interpret-as="expletive">
Bleeping undesirable content
<speak>
Your next song is "Killing in the name of" by Rage Against
the Machine.
</speak>
<speak>
Your next song is "<say-as interpret-
as="expletive">Killing</say-as> in the name of" by Rage
Against the Machine.
</speak>
<say-as interpret-as="spell-out">
Read character by character
<speak>And here is how you spell handkerchief: <prosody
rate="x-slow"><say-as interpret-as="spell-
out">handkerchief</say-as></prosody>.</speak>
Modify speech delivery
<prosody>
The power of commas / periods
Adding punctuation helps getting better prosody
<speak>He went to Harvard and when he decided to drop out it was
not to find enlightenment with an Indian guru but to start a
computer software company.</speak>
<speak>He went to Harvard, and when he decided to drop out, it
was not to find enlightenment with an Indian guru, but to start a
computer software company.</speak>
The <prosody> tag
The <prosody> tag allows some changes to how speech is
delivered, through the following supported attributes
• volume
• rate
• pitch
The volume attribute
Modify the volume of speech
<speak>
I can speak normally, <prosody volume="x-loud"> or I can speak
louder</prosody>.
</speak>
<speak>
I can speak normally, <prosody volume="x-soft"> or I can speak
quieter</prosody>.
</speak>
The rate attribute
Change the speed of speech
<speak>
When I wake up, <prosody rate="x-slow">I speak quite
slowly</prosody>.
</speak>
<speak>
When I am in a hurry, <prosody rate="x-fast">I speak very
fast</prosody>.
</speak>
The pitch attribute
Modify the pitch of a word/phrase
<speak>
When I get angry, <prosody pitch="x-high">my pitch goes way
up</prosody>
</speak>
<speak>
When I get sad, <prosody pitch="x-low">my pitch goes way
down</prosody>
</speak>
The pitch attribute
Modify the pitch of a word/phrase
<speak>
I can go normal, <prosody pitch="high">high</prosody>,<prosody
pitch="x-high">higher</prosody>,<prosody
pitch="low">low</prosody>, and <prosody pitch="x-
low">lower</prosody>.
</speak>
Use pitch to improve intonation
Adding punctuation and modifying pitch helps getting better
prosody
Do you like this or that?
Do you like <prosody pitch="+5%"> this </prosody>, or <prosody
pitch="-2%">that?</prosody>
Punctuation and the <break> tag
Add a pause anywhere (time, strength attributes)
And the winner is <break time='5s'/> Bob Dylan!
And the winner is <break strength="x-strong" /> Bob Dylan!
Fun with SSML
Fun with SSML
'Can you make your voices sound like an auctioneer?'
<speak><prosody rate='+60%'>I’m at 500 and I want
550<prosody volume='x-loud'>550</prosody></prosody>
<prosody rate='+60%'>bid on 550 I’m at 500 would you go
550 550 for the gentleman in the corner</prosody> <prosody
rate="+90%">A big black bug bit a big black bear a big
black bug bit a big black bear</prosody> Do we get 600?
<prosody rate='+90%'>A big black bug bit a big black
bear</prosody><prosody rate='+60%'>We got 600 for the
whole herd</prosody><prosody rate='default' volume='x-
loud'>Sold <prosody rate='+60%'>for
600.</prosody></prosody></speak>
Fun with SSML
'It's good, but can you make her sound like she's from
Boston???'
If your car’s blinkers are broken, it may be the blinker
relay. Fortunately, this car fix is easy to do.
<speak>If <phoneme ph='"jO: "kAz "blIN.k@z'>your car's
blinkers</phoneme> <phoneme ph='%A'>are</phoneme> broken,
it may be the <phoneme ph='"blIN.k@'>blinker</phoneme>
relay. <phoneme ph='"fO.tS@n.@t.li'>Fortunately</phoneme>,
this <phoneme ph='"kA'>car</phoneme> fix is easy to do.
</speak>
• Contact us with any question about this webinar or Polly in general
polly-webinars-feedback@amazon.com
• SSML documentation
http://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html
• Introducing Amazon Polly at re:Ivent 2016
https://www.youtube.com/watch?v=zjMqimHis3U&t=2s
• PLS 1.0 Specifications
https://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/
Next AWS Polly webinar (Apr 10th): "How to integrate Amazon Polly
voices seamlessly into your application workflow"

More Related Content

More from Amazon Web Services

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSAmazon Web Services
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAmazon Web Services
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightAmazon Web Services
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotAmazon Web Services
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Amazon Web Services
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?Amazon Web Services
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksAmazon Web Services
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Amazon Web Services
 

More from Amazon Web Services (20)

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSight
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced Attacks
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
 

Recently uploaded

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

How to get the most out of Polly, Leveraging Lexicons and SSML - March 2017 AWS Online Tech Talks

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Marco Nicolis, Remus Mois Amazon Text-to-Speech 03/27/2017 How to get the most out of Polly Leveraging lexicons and SSML
  • 2. What to Expect from the Session • What is Polly? • Example app • Using punctuation and SSML • Using external Lexicons • Q&A
  • 3. • A service that converts text into lifelike speech • 47 voices, 24 languages • Developers can store, replay and distribute generated speech What is Polly?
  • 4. The Polly console I bought 2lbs of meat and 16oz of potatoes Justin (US) Amy (UK) Raveena (IN)
  • 6. Text-to-Speech Pipeline Text Text normalization Grapheme-to-phoneme conversion Waveform generation Speech She has $20 in her pocket. she has twenty dollars in her pocket ˈ ʃ i ˈ h æ z ˈ t w ɛ n . t i ˈ d ɑ . ɫ ə ɹ z ˈ ɪ n ˈ h ɝ ɹ ˈ p ɑ . k ə t
  • 7. Goal: Convert text into intelligible, accurate, and natural speech • G2P: rough, though, through. • Homographs: same spelling, different pronunciations. I live in Poland This presentation is broadcasted live from Poland Context helps 'live' disambiguation. But... I read this book. Main Challenges for Text-to-Speech
  • 8. • Text normalization: disambiguation of abbreviations, acronyms, units ‘St.’ expanded as ‘street’ or ‘saint’ <speak>St. Patrick St.</speak> • Foreign words (déjà vu), proper names (François Hollande), social media lingo (ASAP, LOL) etc. Main challenges for Text-to-Speech
  • 9. Speech Synthesis Markup Language (SSML) • W3C recommendation, XML-based markup language for speech synthesis applications. AWS Polly tags are compliant with SSML 1.1 specifications. • Allows customers to modify certain aspects of the TTS speech output, for example pronunciation of words, expansion of abbreviation, acronyms, etc., as well as pitch, rate of speech, volume, etc. SSML in Polly
  • 10. All SSML documents must start with an opening <speak> tag and end with a closing </speak> tag. All other tags are inserted between <speak></speak> SSML document structure
  • 12. Changing pronunciations in Polly <sub> <phoneme>
  • 13. The <sub> tag In-line aliasing In many cases we do not want to change all instances of a certain word. <speak> My favorite chemical element is <sub alias="aluminum">Al</sub>,but Al prefers <sub alias="magnesium">Mg</sub>. </speak>
  • 14. The <phoneme> tag Force pronunciation in-line Read: present or past? I <phoneme alphabet = "x-sampa" ph='"rid'>read</phoneme> a book. I <phoneme alphabet = "x-sampa" ph='"rEd'>read</phoneme> a book. Examples of EN phonemes http://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html IPA X-SAMPA Example ɹ r red ɛ E dress i i fleece d d dig
  • 16. Alias (e.g. abbreviation expansion) Follows the Pronunciation Lexicon Specifications (PLS) <lexeme><grapheme>Ne</grapheme><alias>Neon</alias></lexeme> <lexeme><grapheme>Na</grapheme><alias>Sodium</alias></lexeme> <lexeme><grapheme>Mg</grapheme><alias>Magnesium</alias></lexeme> <lexeme><grapheme>Al</grapheme><alias>Aluminum</alias></lexeme> <lexeme><grapheme>Si</grapheme><alias>Silicon</alias></lexeme> <speak>Mg and Al are chemical elements</speak> Lexicons: <alias>
  • 17. Assign custom pronunciation (IPA or X-Sampa alphabets) Settling the 'gif' issue once and for all. <lexeme><grapheme>gif</grapheme><phoneme>"dZIf</phoneme></lexeme> <lexeme><grapheme>David</grapheme><phoneme>"dA.%vid</phoneme> </lexeme> <speak>I like this gif.</speak> <speak>Here's my friend David.</speak> Lexicons: <phoneme>
  • 19. The <lang> tag Foreign words and phrases Foreign phrases are rendered better if they are enclosed inside the <lang> tag, as in the following example. French in English <speak> J'adore chanter. </speak> <speak> <lang xml:lang="fr-FR">J'adore chanter</lang>. </speak>
  • 20. The <lang> tag English in Italian The pronunciation of English is like that of a non-bilingual Italian speaker. <speak> Mi piace Bruce Springsteen. </speak> <speak> Mi piace <lang xml:lang="en-US">Bruce Springsteen.</lang> </speak>
  • 21. The <lang> tag Multiple languages All languages supported by AWS Polly can be invoked by the lang tag. EN FR IT ES PL <speak>Onion, onion, cipolla, cebolla, cebula.</speak> <speak>Onion, <lang xml:lang="fr-FR">onion</lang>, <lang xml:lang="it-IT">cipolla</lang>, <lang xml:lang="es- ES">cebolla</lang>, <lang xml:lang="pl-PL">cebula</lang>.</speak>
  • 22. Define a specific interpretation <say-as interpret-as="">
  • 23. The <say-as> tag • The TTS engine works well for most common and unambiguous text structures, such as dates, time, etc.. • Possible to force interpretation through the <say-as> tag in ambiguous cases. (phone number, addresses, etc.) Phone numbers (interpret-as="telephone") <speak>(514) 888-5195 <say-as interpret-as="telephone">(514) 888-5195</say-as> </speak> <speak>(514) 888-5195x123 </speak> <speak><say-as interpret-as="telephone">(514) 888-5195x123</say- as></speak>
  • 24. The <say-as> tag Phone numbers (US vs. UK): different pronunciation styles. US Richard's number is <prosody rate='slow'> <say-as interpret- as='telephone'>(212) 224-1555</say-as> </prosody> UK Richard's number is <prosody rate='slow'> <say-as interpret- as='telephone'>(212) 224-1555</say-as></prosody>
  • 25. <say-as interpret-as="expletive"> Bleeping undesirable content <speak> Your next song is "Killing in the name of" by Rage Against the Machine. </speak> <speak> Your next song is "<say-as interpret- as="expletive">Killing</say-as> in the name of" by Rage Against the Machine. </speak>
  • 26. <say-as interpret-as="spell-out"> Read character by character <speak>And here is how you spell handkerchief: <prosody rate="x-slow"><say-as interpret-as="spell- out">handkerchief</say-as></prosody>.</speak>
  • 28. The power of commas / periods Adding punctuation helps getting better prosody <speak>He went to Harvard and when he decided to drop out it was not to find enlightenment with an Indian guru but to start a computer software company.</speak> <speak>He went to Harvard, and when he decided to drop out, it was not to find enlightenment with an Indian guru, but to start a computer software company.</speak>
  • 29. The <prosody> tag The <prosody> tag allows some changes to how speech is delivered, through the following supported attributes • volume • rate • pitch
  • 30. The volume attribute Modify the volume of speech <speak> I can speak normally, <prosody volume="x-loud"> or I can speak louder</prosody>. </speak> <speak> I can speak normally, <prosody volume="x-soft"> or I can speak quieter</prosody>. </speak>
  • 31. The rate attribute Change the speed of speech <speak> When I wake up, <prosody rate="x-slow">I speak quite slowly</prosody>. </speak> <speak> When I am in a hurry, <prosody rate="x-fast">I speak very fast</prosody>. </speak>
  • 32. The pitch attribute Modify the pitch of a word/phrase <speak> When I get angry, <prosody pitch="x-high">my pitch goes way up</prosody> </speak> <speak> When I get sad, <prosody pitch="x-low">my pitch goes way down</prosody> </speak>
  • 33. The pitch attribute Modify the pitch of a word/phrase <speak> I can go normal, <prosody pitch="high">high</prosody>,<prosody pitch="x-high">higher</prosody>,<prosody pitch="low">low</prosody>, and <prosody pitch="x- low">lower</prosody>. </speak>
  • 34. Use pitch to improve intonation Adding punctuation and modifying pitch helps getting better prosody Do you like this or that? Do you like <prosody pitch="+5%"> this </prosody>, or <prosody pitch="-2%">that?</prosody>
  • 35. Punctuation and the <break> tag Add a pause anywhere (time, strength attributes) And the winner is <break time='5s'/> Bob Dylan! And the winner is <break strength="x-strong" /> Bob Dylan!
  • 37. Fun with SSML 'Can you make your voices sound like an auctioneer?' <speak><prosody rate='+60%'>I’m at 500 and I want 550<prosody volume='x-loud'>550</prosody></prosody> <prosody rate='+60%'>bid on 550 I’m at 500 would you go 550 550 for the gentleman in the corner</prosody> <prosody rate="+90%">A big black bug bit a big black bear a big black bug bit a big black bear</prosody> Do we get 600? <prosody rate='+90%'>A big black bug bit a big black bear</prosody><prosody rate='+60%'>We got 600 for the whole herd</prosody><prosody rate='default' volume='x- loud'>Sold <prosody rate='+60%'>for 600.</prosody></prosody></speak>
  • 38. Fun with SSML 'It's good, but can you make her sound like she's from Boston???' If your car’s blinkers are broken, it may be the blinker relay. Fortunately, this car fix is easy to do. <speak>If <phoneme ph='"jO: "kAz "blIN.k@z'>your car's blinkers</phoneme> <phoneme ph='%A'>are</phoneme> broken, it may be the <phoneme ph='"blIN.k@'>blinker</phoneme> relay. <phoneme ph='"fO.tS@n.@t.li'>Fortunately</phoneme>, this <phoneme ph='"kA'>car</phoneme> fix is easy to do. </speak>
  • 39. • Contact us with any question about this webinar or Polly in general polly-webinars-feedback@amazon.com • SSML documentation http://docs.aws.amazon.com/polly/latest/dg/supported-ssml.html • Introducing Amazon Polly at re:Ivent 2016 https://www.youtube.com/watch?v=zjMqimHis3U&t=2s • PLS 1.0 Specifications https://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/ Next AWS Polly webinar (Apr 10th): "How to integrate Amazon Polly voices seamlessly into your application workflow"