SlideShare a Scribd company logo
1 of 13
Advancing Estonian Machine Translation
Matīss Rikters, Mārcis Pinnis, Roberts Rozis
Tilde
{firstname.lastname}@tilde.lv
September 28, 2018
Problems
• Insufficient amount of high-quality data
• Weak baseline systems
• Poor performance when translating between
Estonian and a non-English language
• The very best MT systems are usually found in experimental
environments and are inaccessible to the general public
Solutions
• Collection of Data
• Improvements for Neural Machine Translation Systems
• Translation Application for Mobile Devices
Collection of Data
• Publicly available parallel corpora resources
DGT-TM, DCEP, Europarl, Opus, and others
• Estonian national sources
content collected from public sector of Estonian institutions and their websites
• Corpora built from sources identified outside of Europe
Quite often – content in public, multilingual European web sites. We identify, collect,
process and publish such content in open data portals. E.g., results of Estonian Open
Parallel Corpus (EOPC) project published in the META-SHARE repository; results of
ODINE project published in TILDE MODEL site
Data
Multilingual NMT
Johnson et al. 2016
<2et> uncle Dick died when I was fifteen .
<2et> the girl stopped , and looked him in the eyes .
<2et> she had taken off her bonnet and held it in her hand .
<2et> Charles rose and looked out of the window .
<2et> he wished he could draw .
<2et> he began to cover the ambiguous face in lather .
<2et> и все таки она очень хотела ей помочь .
<2et> Преследуемая женщина прыгнула со скалы .
<2et> ей не удавалось найти общий язык с другими детьми .
<2ru> hoidke käed alati sõidu ajal sõiduki juhtimiseks vabad .
<2ru> selle rakenduse kasutamiseks peab teie veebilehitseja toetama Javascripti .
<2ru> juhtmeta seadmed võivad häirida lennuki toimimist .
<2ru> Ärge kasutage seadet tanklas , kütuse ega kemikaalide lähedal .
<2ru> Ärge unustage teha kogu olulisest infost varu - või kirjalikke koopiaid .
onu Dick suri , kui olin viisteist .
tüdruk peatus ja vaatas talle silma .
ta oli kübara peast võtnud ning hoidis seda käes .
Charles tõusis ja vaatas aknast välja .
ta tahtnuks osata joonistada .
ta asus oma kahemõttelist ilmet seebivahuga katma .
ja siiski tahtnuks ta väga teda aidata .
tagaaetav naine hüppas kaljult alla .
ta ei saanud teiste õpilastega hästi läbi
Пусть во время вождения ваши руки всегда будут свободны для управления транспортным средством .
если вы хотите использовать эту прикладную программу , ваш браузер должен поддерживать “ Javascript ” .
радиотехнические устройства могут мешать функционированию самолета .
не используйте устройство на заправочных станциях или рядом с топливом или химическими веществами .
те забудьте создать резервные копии всей важной информации или запишите эту информацию .
Multilingual NMT
Rikters et al. 2018
Multilingual NMT support in Nematus
• Specify all training languages and the respective files in the configuration
• Automatically add target language tags to source sentences
• Shuffle an equal portion of each language in each batch
Scripts for easy multilingual NMT training with other frameworks
• Upscale data from all languages to the amount of the largest corpus
• Add target language tags to all source sentences
https://github.com/tilde-nlp/multilingual-nmt-data-prep
Experiments
Experiments
Results
Android App
Conclusions
• New state-of-the-art MT systems for
• Estonian ↔ Russian
• Estonian ↔ English
• Available online
• masintolge.ee
• github.com/tilde-nlp/et-mt-tools
”Neural Network Modelling for Inflected Natural Languages”
No. 1.1.1.1/16/A/215.

More Related Content

Similar to Advancing Estonian Machine Translation

New trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsNew trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsMaría Poveda Villalón
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
ESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionErik Mannens
 
Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr....
Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr....Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr....
Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr....ETH-Bibliothek
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfGeethaPratyusha
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPAnkita Jadhao
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPAnkita Jadhao
 
Global Research Data Initiatives
Global Research Data InitiativesGlobal Research Data Initiatives
Global Research Data InitiativesSarah Jones
 
Digital freedoms in education
Digital freedoms in educationDigital freedoms in education
Digital freedoms in educationFrederik Questier
 
DSD-SEA 2019 Deltares and DSD-SEA - Nauta
DSD-SEA 2019 Deltares and DSD-SEA - NautaDSD-SEA 2019 Deltares and DSD-SEA - Nauta
DSD-SEA 2019 Deltares and DSD-SEA - NautaDeltares
 
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...Digitalmikkeli
 
The Dawn of the Internet in Brazil
The Dawn of the Internet in BrazilThe Dawn of the Internet in Brazil
The Dawn of the Internet in BrazilMarcelo Sávio
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja Swapnaja Tandale
 
Collecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverTextCollecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverTextJill Hopke
 
Linq 2013 plenary_keynote_sicilia
Linq 2013 plenary_keynote_siciliaLinq 2013 plenary_keynote_sicilia
Linq 2013 plenary_keynote_siciliaLINQ_Conference
 

Similar to Advancing Estonian Machine Translation (20)

New trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and toolsNew trends in ontological engineering, practices and tools
New trends in ontological engineering, practices and tools
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
ESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking SessionESWC 2015 - EU Networking Session
ESWC 2015 - EU Networking Session
 
Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr....
Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr....Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr....
Openness, exchange, FAIR DATA – oh brave new world that has such vision! (Dr....
 
06.09.26.Handout
06.09.26.Handout06.09.26.Handout
06.09.26.Handout
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Amugham
AmughamAmugham
Amugham
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdf
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLP
 
MOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLPMOBILE DEVICE FORENSICS USING NLP
MOBILE DEVICE FORENSICS USING NLP
 
Global Research Data Initiatives
Global Research Data InitiativesGlobal Research Data Initiatives
Global Research Data Initiatives
 
Digital freedoms in education
Digital freedoms in educationDigital freedoms in education
Digital freedoms in education
 
DSD-SEA 2019 Deltares and DSD-SEA - Nauta
DSD-SEA 2019 Deltares and DSD-SEA - NautaDSD-SEA 2019 Deltares and DSD-SEA - Nauta
DSD-SEA 2019 Deltares and DSD-SEA - Nauta
 
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
 
The Dawn of the Internet in Brazil
The Dawn of the Internet in BrazilThe Dawn of the Internet in Brazil
The Dawn of the Internet in Brazil
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja
 
Collecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverTextCollecting and Coding Twitter Data in DiscoverText
Collecting and Coding Twitter Data in DiscoverText
 
Dig c curr
Dig c currDig c curr
Dig c curr
 
Linq 2013 plenary_keynote_sicilia
Linq 2013 plenary_keynote_siciliaLinq 2013 plenary_keynote_sicilia
Linq 2013 plenary_keynote_sicilia
 

More from Matīss ‎‎‎‎‎‎‎  

Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationMatīss ‎‎‎‎‎‎‎  
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsMatīss ‎‎‎‎‎‎‎  
 

More from Matīss ‎‎‎‎‎‎‎   (20)

日本のお風呂
日本のお風呂日本のお風呂
日本のお風呂
 
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
 
私の趣味
私の趣味私の趣味
私の趣味
 
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
 
私の町リガ
私の町リガ私の町リガ
私の町リガ
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
 
ラトビア大学
ラトビア大学ラトビア大学
ラトビア大学
 
私の趣味
私の趣味私の趣味
私の趣味
 
富士山りょこう
富士山りょこう富士山りょこう
富士山りょこう
 
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systems
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

Advancing Estonian Machine Translation

  • 1. Advancing Estonian Machine Translation Matīss Rikters, Mārcis Pinnis, Roberts Rozis Tilde {firstname.lastname}@tilde.lv September 28, 2018
  • 2. Problems • Insufficient amount of high-quality data • Weak baseline systems • Poor performance when translating between Estonian and a non-English language • The very best MT systems are usually found in experimental environments and are inaccessible to the general public
  • 3. Solutions • Collection of Data • Improvements for Neural Machine Translation Systems • Translation Application for Mobile Devices
  • 4. Collection of Data • Publicly available parallel corpora resources DGT-TM, DCEP, Europarl, Opus, and others • Estonian national sources content collected from public sector of Estonian institutions and their websites • Corpora built from sources identified outside of Europe Quite often – content in public, multilingual European web sites. We identify, collect, process and publish such content in open data portals. E.g., results of Estonian Open Parallel Corpus (EOPC) project published in the META-SHARE repository; results of ODINE project published in TILDE MODEL site
  • 6. Multilingual NMT Johnson et al. 2016 <2et> uncle Dick died when I was fifteen . <2et> the girl stopped , and looked him in the eyes . <2et> she had taken off her bonnet and held it in her hand . <2et> Charles rose and looked out of the window . <2et> he wished he could draw . <2et> he began to cover the ambiguous face in lather . <2et> и все таки она очень хотела ей помочь . <2et> Преследуемая женщина прыгнула со скалы . <2et> ей не удавалось найти общий язык с другими детьми . <2ru> hoidke käed alati sõidu ajal sõiduki juhtimiseks vabad . <2ru> selle rakenduse kasutamiseks peab teie veebilehitseja toetama Javascripti . <2ru> juhtmeta seadmed võivad häirida lennuki toimimist . <2ru> Ärge kasutage seadet tanklas , kütuse ega kemikaalide lähedal . <2ru> Ärge unustage teha kogu olulisest infost varu - või kirjalikke koopiaid . onu Dick suri , kui olin viisteist . tüdruk peatus ja vaatas talle silma . ta oli kübara peast võtnud ning hoidis seda käes . Charles tõusis ja vaatas aknast välja . ta tahtnuks osata joonistada . ta asus oma kahemõttelist ilmet seebivahuga katma . ja siiski tahtnuks ta väga teda aidata . tagaaetav naine hüppas kaljult alla . ta ei saanud teiste õpilastega hästi läbi Пусть во время вождения ваши руки всегда будут свободны для управления транспортным средством . если вы хотите использовать эту прикладную программу , ваш браузер должен поддерживать “ Javascript ” . радиотехнические устройства могут мешать функционированию самолета . не используйте устройство на заправочных станциях или рядом с топливом или химическими веществами . те забудьте создать резервные копии всей важной информации или запишите эту информацию .
  • 7. Multilingual NMT Rikters et al. 2018 Multilingual NMT support in Nematus • Specify all training languages and the respective files in the configuration • Automatically add target language tags to source sentences • Shuffle an equal portion of each language in each batch Scripts for easy multilingual NMT training with other frameworks • Upscale data from all languages to the amount of the largest corpus • Add target language tags to all source sentences https://github.com/tilde-nlp/multilingual-nmt-data-prep
  • 12. Conclusions • New state-of-the-art MT systems for • Estonian ↔ Russian • Estonian ↔ English • Available online • masintolge.ee • github.com/tilde-nlp/et-mt-tools
  • 13. ”Neural Network Modelling for Inflected Natural Languages” No. 1.1.1.1/16/A/215.