SlideShare a Scribd company logo
1 of 21
Barcelona Supercomputing Center (BSC):
• Antonio Miranda-Escalada
• Luis Gascó
• Salvador Lima-López
• Eulàlia Farré-Maduell
• Darryl Estrada
• Martin Krallinger
Mention detection, normalization &
classification of species, pathogens,
humans and food in clinical
documents: Overview of the
LivingNER shared task and resources
Martin Krallinger
Head of Text Mining Unit, BSC
<mkrallin@bsc.es>
IberLEF @ SEPLN 2022 LivingNER corpus: doi.org/10.5281/zenodo.6376662
1
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Importance of species information extraction
De − Allice Hunter - File:Hispanophone global world map language.png,CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=69323596
National Center for
Biotechnology Information
(NCBI) Taxonomy
How many species inhabit the earth
How many species do we know
Quantification of global species
richness
Taxonomic classification of species
Number of species in a taxonomic
group
Validation against well-known taxa
250 years of taxonomic classification
1.2 million species catalogued in a
central database
86% of species on Earth and 91% of
species in the ocean still await
description
Knowledge gap
-Large collection of species, change over time, hierarchical relation types relation
types
-Homonymy with commonly used words, e.g.: “Spot” (Leiostomus xanthurus) and
“Permit” (Trachinotus falcatus)
-Homonymy with other medical entities (the word “goat” can refer to proteins
found in human, zebrafish, rat and mouse.
-Abbreviations are ambiguous, e.g.: HBV can be used for both “Hepatitis B virus”
as well as “Hepatitis B vaccine”
-Vernacular form (common names)
- Incorrect case or misspelt (like, Bacterium coli, Bacillus coli and Escheria coli for
Escherichia coli)
- Coordinations, nested expressions: “human immunodeficiency viruses types 1
and 2”, refer to two distinct species names, “HIV type 1” and “HIV type 2”
- Role names (e.g. athletes, responders)
- Human mencions in the form of family members, etc….
Challenges
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Previous SPECIES extraction and normalization efforts
● LivingNER
< 2000 2000-2010 2010-2021 2022
● The Catalogue of Life [Index of
the world's species] [Bánki et al.,
2022] [2001]
●Infectious Diseases (ID) task of BioNLP [Corpus and
shared task] [Pyysalo et al., 2011] [2011]
● SPECIES [Species mention and normalisation to NCBI
taxonomy corpus and tool] [Pafilis et al., 2013] [2014]
● ITIS (Integrated Taxonomic Information
System) [Federal effort to provide consistent
biological taxonomies] [1996]
● NCBI taxonomy [Terminological resource]
[Federhen, 2012] [1997]
● Global Names Architecture database [organizes
and cross-links electronic information about
organisms] [Pyle et al., 2016] [2016]
● LINNAEUS [Species mention and
normalisation to NCBI taxonomy corpus
and tool] [Gerner et al., 2010] [2010]
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER overview
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER resources
LivingNER corpus: doi.org/10.5281/zenodo.6376662
LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162
LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662
LivingNER terminology: doi.org/10.5281/zenodo.6390506
LivingNER Silver Standard:
LivingNER evaluation library:
github.com/tonifuc3m/livingner-evaluation-library
LivingNER participant systems:
temu.bsc.es/livingner/participant-systems/
LivingNER YouTube playlist:
https://www.youtube.com/channel/UCDsmS1pCCO8TW312wJq8aCQ/playlists
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER Corpus: documents, format and annotation
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER Corpus - Overview
● Diversity: Atención primaria, dermatología, medicina interna, medicina tropical,
endocrinología, neurología, oftalmología, psiquiatría, radiología, urgencias, cardiología,
pediatrita, oncología, odontología,..
● Manual entity annotations, NCBI taxonomy mapping and application classification
● Inter-Annotator Agreement (IAA): 94.2
● Random training, validation and test split Most common SPECIES mentions
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Multilingual Silver Standard
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Multilingual Silver Standard
Spanish Gold Standard English Silver Standard
Online visualiser:
https://temu.bsc.es/mLivingNER/diff.xhtml#/translations/en/annotation_transfer/train/caso_clinico_radiologia942?dif
f=/gold-standard/train/
NCBI Tax
ID: 11103
NCBI Tax
ID: 11103
NCBI Tax
ID: 1311
NCBI Tax
ID: 1311
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participating teams
● Registrations: 56
● SPECIES NER track: 20
participating teams, 41
submissions
● SPECIES Norm track: 8
teams, 14 submissions
● Clinical Impact track:
5 teams, 6 submissions
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participant results
● MiF: micro-averaged F-score (main metric)
● MiP: micro-avg. Precision
● MiR: micro-avg. Recall
github.com/tonifuc3m/livingner-evaluation-library
SPECIES NER SPECIES Norm
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participant results
● MiF: micro-averaged F-score (main metric)
● MiP: micro-avg. Precision
● MiR: micro-avg. Recall
github.com/tonifuc3m/livingner-evaluation-library
SPECIES NER SPECIES Norm
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participant results - Clinical Impact track
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
• Increasing interest in Spanish clinical NLP tasks
• LivingNER Resources
○ LivingNER Corpus: Species entity Gold Standard corpus mapped to NCBI Taxonomy.
○ LivingNER Multilingual Silver Standard Corpus: Disease entity corpora normalised to
NCBI Taxonomy in several languages.
○ LivingNER Spanish Silver Standard (from participants’ predictions)
Conclusions
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
• Correct the LivingNER Multilingual Silver Standard to generate a Gold Standard subset
of each language to create high-quality benchmarks in the seven languages.
• Clinical Impact track lacked enough training and test data, and we plan to correct this
issue in the future.
Future directions
● Generate more granular annotations
for the HUMAN mentions that are
needed for real-world applications.
Actual examples of annotated species mentions and automatically
recognized profession mentions.
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Acknowledgements
LivingNER Participants &
LivingNER Scientific Committee
IberLEF organisers
● Manuel
● Julio
● and all others
SEPLN organisers
Funding:
• Plan de Tecnologías del Lenguaje
• AI4PROFHEALTH (PID2020-119266RA-I00)
• BioMATDB Horizon Europe Grant
Agreement No 101058779
BSC Text Mining Unit
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER resources
LivingNER corpus: doi.org/10.5281/zenodo.6376662
LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162
LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662
LivingNER terminology: doi.org/10.5281/zenodo.6390506
LivingNER Silver Standard:
LivingNER evaluation library:
github.com/tonifuc3m/livingner-evaluation-library
LivingNER participant systems:
temu.bsc.es/livingner/participant-systems/
LivingNER YouTube playlist:
https://youtube.com/playlist?list=PL5uSCzf1azhA_gMLC3DBZe6NvmMJiggTg
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Questions?
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com

More Related Content

Similar to Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources (talk at IberLEF @ SEPLN 2022)

SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...Martin Krallinger
 
Mansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedINMansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedINColin MANSFIELD
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET
 
Utility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsUtility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsBedirhan Ustun
 
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...AnitaPoudel5
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsNigel Collier
 
2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overviewdvreeman
 
Neuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial ManagerNeuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial ManagerFIBAO
 
Vectors, environment and society unit
Vectors, environment and society unitVectors, environment and society unit
Vectors, environment and society unitvaléry ridde
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral MedicineHarold Slavkin
 
2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WG2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WGdvreeman
 
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...Arvinder Singh
 
2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introduction2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introductiondvreeman
 
Country Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - BangladeshCountry Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - Bangladeshapaari
 
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 MmatchmakingPNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 MmatchmakingRIICCHPeru
 
Knowledge curation for COVID-19
Knowledge curation for COVID-19Knowledge curation for COVID-19
Knowledge curation for COVID-19Sonja Aits
 
State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015Bedirhan Ustun
 
Fish biodiversity and food supply: Species numbers in the wild and exploited;...
Fish biodiversity and food supply: Species numbers in the wild and exploited;...Fish biodiversity and food supply: Species numbers in the wild and exploited;...
Fish biodiversity and food supply: Species numbers in the wild and exploited;...WorldFish
 
Indo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalIndo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalVishwas Chavan
 

Similar to Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources (talk at IberLEF @ SEPLN 2022) (20)

R Obomsawin CV
R Obomsawin CVR Obomsawin CV
R Obomsawin CV
 
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
 
Mansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedINMansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedIN
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
 
Utility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsUtility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information Systems
 
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease Informatics
 
2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview
 
Neuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial ManagerNeuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial Manager
 
Vectors, environment and society unit
Vectors, environment and society unitVectors, environment and society unit
Vectors, environment and society unit
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral Medicine
 
2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WG2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WG
 
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
 
2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introduction2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introduction
 
Country Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - BangladeshCountry Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - Bangladesh
 
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 MmatchmakingPNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
 
Knowledge curation for COVID-19
Knowledge curation for COVID-19Knowledge curation for COVID-19
Knowledge curation for COVID-19
 
State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015
 
Fish biodiversity and food supply: Species numbers in the wild and exploited;...
Fish biodiversity and food supply: Species numbers in the wild and exploited;...Fish biodiversity and food supply: Species numbers in the wild and exploited;...
Fish biodiversity and food supply: Species numbers in the wild and exploited;...
 
Indo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalIndo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_final
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources (talk at IberLEF @ SEPLN 2022)

  • 1. Barcelona Supercomputing Center (BSC): • Antonio Miranda-Escalada • Luis Gascó • Salvador Lima-López • Eulàlia Farré-Maduell • Darryl Estrada • Martin Krallinger Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources Martin Krallinger Head of Text Mining Unit, BSC <mkrallin@bsc.es> IberLEF @ SEPLN 2022 LivingNER corpus: doi.org/10.5281/zenodo.6376662 1
  • 2. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Importance of species information extraction De − Allice Hunter - File:Hispanophone global world map language.png,CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=69323596 National Center for Biotechnology Information (NCBI) Taxonomy
  • 3. How many species inhabit the earth How many species do we know Quantification of global species richness Taxonomic classification of species Number of species in a taxonomic group Validation against well-known taxa 250 years of taxonomic classification 1.2 million species catalogued in a central database 86% of species on Earth and 91% of species in the ocean still await description Knowledge gap
  • 4. -Large collection of species, change over time, hierarchical relation types relation types -Homonymy with commonly used words, e.g.: “Spot” (Leiostomus xanthurus) and “Permit” (Trachinotus falcatus) -Homonymy with other medical entities (the word “goat” can refer to proteins found in human, zebrafish, rat and mouse. -Abbreviations are ambiguous, e.g.: HBV can be used for both “Hepatitis B virus” as well as “Hepatitis B vaccine” -Vernacular form (common names) - Incorrect case or misspelt (like, Bacterium coli, Bacillus coli and Escheria coli for Escherichia coli) - Coordinations, nested expressions: “human immunodeficiency viruses types 1 and 2”, refer to two distinct species names, “HIV type 1” and “HIV type 2” - Role names (e.g. athletes, responders) - Human mencions in the form of family members, etc…. Challenges
  • 5. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Previous SPECIES extraction and normalization efforts ● LivingNER < 2000 2000-2010 2010-2021 2022 ● The Catalogue of Life [Index of the world's species] [Bánki et al., 2022] [2001] ●Infectious Diseases (ID) task of BioNLP [Corpus and shared task] [Pyysalo et al., 2011] [2011] ● SPECIES [Species mention and normalisation to NCBI taxonomy corpus and tool] [Pafilis et al., 2013] [2014] ● ITIS (Integrated Taxonomic Information System) [Federal effort to provide consistent biological taxonomies] [1996] ● NCBI taxonomy [Terminological resource] [Federhen, 2012] [1997] ● Global Names Architecture database [organizes and cross-links electronic information about organisms] [Pyle et al., 2016] [2016] ● LINNAEUS [Species mention and normalisation to NCBI taxonomy corpus and tool] [Gerner et al., 2010] [2010]
  • 6. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER overview
  • 7. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER resources LivingNER corpus: doi.org/10.5281/zenodo.6376662 LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162 LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662 LivingNER terminology: doi.org/10.5281/zenodo.6390506 LivingNER Silver Standard: LivingNER evaluation library: github.com/tonifuc3m/livingner-evaluation-library LivingNER participant systems: temu.bsc.es/livingner/participant-systems/ LivingNER YouTube playlist: https://www.youtube.com/channel/UCDsmS1pCCO8TW312wJq8aCQ/playlists
  • 8. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER Corpus: documents, format and annotation
  • 9. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER Corpus - Overview ● Diversity: Atención primaria, dermatología, medicina interna, medicina tropical, endocrinología, neurología, oftalmología, psiquiatría, radiología, urgencias, cardiología, pediatrita, oncología, odontología,.. ● Manual entity annotations, NCBI taxonomy mapping and application classification ● Inter-Annotator Agreement (IAA): 94.2 ● Random training, validation and test split Most common SPECIES mentions
  • 10. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Multilingual Silver Standard
  • 11. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Multilingual Silver Standard Spanish Gold Standard English Silver Standard Online visualiser: https://temu.bsc.es/mLivingNER/diff.xhtml#/translations/en/annotation_transfer/train/caso_clinico_radiologia942?dif f=/gold-standard/train/ NCBI Tax ID: 11103 NCBI Tax ID: 11103 NCBI Tax ID: 1311 NCBI Tax ID: 1311
  • 12. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participating teams ● Registrations: 56 ● SPECIES NER track: 20 participating teams, 41 submissions ● SPECIES Norm track: 8 teams, 14 submissions ● Clinical Impact track: 5 teams, 6 submissions
  • 13. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participant results ● MiF: micro-averaged F-score (main metric) ● MiP: micro-avg. Precision ● MiR: micro-avg. Recall github.com/tonifuc3m/livingner-evaluation-library SPECIES NER SPECIES Norm
  • 14. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participant results ● MiF: micro-averaged F-score (main metric) ● MiP: micro-avg. Precision ● MiR: micro-avg. Recall github.com/tonifuc3m/livingner-evaluation-library SPECIES NER SPECIES Norm
  • 15. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participant results - Clinical Impact track
  • 16. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com • Increasing interest in Spanish clinical NLP tasks • LivingNER Resources ○ LivingNER Corpus: Species entity Gold Standard corpus mapped to NCBI Taxonomy. ○ LivingNER Multilingual Silver Standard Corpus: Disease entity corpora normalised to NCBI Taxonomy in several languages. ○ LivingNER Spanish Silver Standard (from participants’ predictions) Conclusions
  • 17. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com • Correct the LivingNER Multilingual Silver Standard to generate a Gold Standard subset of each language to create high-quality benchmarks in the seven languages. • Clinical Impact track lacked enough training and test data, and we plan to correct this issue in the future. Future directions ● Generate more granular annotations for the HUMAN mentions that are needed for real-world applications. Actual examples of annotated species mentions and automatically recognized profession mentions.
  • 18. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Acknowledgements LivingNER Participants & LivingNER Scientific Committee IberLEF organisers ● Manuel ● Julio ● and all others SEPLN organisers Funding: • Plan de Tecnologías del Lenguaje • AI4PROFHEALTH (PID2020-119266RA-I00) • BioMATDB Horizon Europe Grant Agreement No 101058779 BSC Text Mining Unit
  • 19. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER resources LivingNER corpus: doi.org/10.5281/zenodo.6376662 LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162 LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662 LivingNER terminology: doi.org/10.5281/zenodo.6390506 LivingNER Silver Standard: LivingNER evaluation library: github.com/tonifuc3m/livingner-evaluation-library LivingNER participant systems: temu.bsc.es/livingner/participant-systems/ LivingNER YouTube playlist: https://youtube.com/playlist?list=PL5uSCzf1azhA_gMLC3DBZe6NvmMJiggTg
  • 20. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Questions?
  • 21. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com