SlideShare a Scribd company logo
1 of 21
Barcelona Supercomputing Center (BSC):
• Antonio Miranda-Escalada
• Luis Gascó
• Salvador Lima-López
• Eulàlia Farré-Maduell
• Darryl Estrada
• Martin Krallinger
Mention detection, normalization &
classification of species, pathogens,
humans and food in clinical
documents: Overview of the
LivingNER shared task and resources
Martin Krallinger
Head of Text Mining Unit, BSC
<mkrallin@bsc.es>
IberLEF @ SEPLN 2022 LivingNER corpus: doi.org/10.5281/zenodo.6376662
1
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Importance of species information extraction
De − Allice Hunter - File:Hispanophone global world map language.png,CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=69323596
National Center for
Biotechnology Information
(NCBI) Taxonomy
How many species inhabit the earth
How many species do we know
Quantification of global species
richness
Taxonomic classification of species
Number of species in a taxonomic
group
Validation against well-known taxa
250 years of taxonomic classification
1.2 million species catalogued in a
central database
86% of species on Earth and 91% of
species in the ocean still await
description
Knowledge gap
-Large collection of species, change over time, hierarchical relation types relation
types
-Homonymy with commonly used words, e.g.: “Spot” (Leiostomus xanthurus) and
“Permit” (Trachinotus falcatus)
-Homonymy with other medical entities (the word “goat” can refer to proteins
found in human, zebrafish, rat and mouse.
-Abbreviations are ambiguous, e.g.: HBV can be used for both “Hepatitis B virus”
as well as “Hepatitis B vaccine”
-Vernacular form (common names)
- Incorrect case or misspelt (like, Bacterium coli, Bacillus coli and Escheria coli for
Escherichia coli)
- Coordinations, nested expressions: “human immunodeficiency viruses types 1
and 2”, refer to two distinct species names, “HIV type 1” and “HIV type 2”
- Role names (e.g. athletes, responders)
- Human mencions in the form of family members, etc….
Challenges
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Previous SPECIES extraction and normalization efforts
● LivingNER
< 2000 2000-2010 2010-2021 2022
● The Catalogue of Life [Index of
the world's species] [Bánki et al.,
2022] [2001]
●Infectious Diseases (ID) task of BioNLP [Corpus and
shared task] [Pyysalo et al., 2011] [2011]
● SPECIES [Species mention and normalisation to NCBI
taxonomy corpus and tool] [Pafilis et al., 2013] [2014]
● ITIS (Integrated Taxonomic Information
System) [Federal effort to provide consistent
biological taxonomies] [1996]
● NCBI taxonomy [Terminological resource]
[Federhen, 2012] [1997]
● Global Names Architecture database [organizes
and cross-links electronic information about
organisms] [Pyle et al., 2016] [2016]
● LINNAEUS [Species mention and
normalisation to NCBI taxonomy corpus
and tool] [Gerner et al., 2010] [2010]
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER overview
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER resources
LivingNER corpus: doi.org/10.5281/zenodo.6376662
LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162
LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662
LivingNER terminology: doi.org/10.5281/zenodo.6390506
LivingNER Silver Standard:
LivingNER evaluation library:
github.com/tonifuc3m/livingner-evaluation-library
LivingNER participant systems:
temu.bsc.es/livingner/participant-systems/
LivingNER YouTube playlist:
https://www.youtube.com/channel/UCDsmS1pCCO8TW312wJq8aCQ/playlists
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER Corpus: documents, format and annotation
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER Corpus - Overview
● Diversity: Atención primaria, dermatología, medicina interna, medicina tropical,
endocrinología, neurología, oftalmología, psiquiatría, radiología, urgencias, cardiología,
pediatrita, oncología, odontología,..
● Manual entity annotations, NCBI taxonomy mapping and application classification
● Inter-Annotator Agreement (IAA): 94.2
● Random training, validation and test split Most common SPECIES mentions
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Multilingual Silver Standard
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
DisTEMIST Multilingual Silver Standard
Spanish Gold Standard English Silver Standard
Online visualiser:
https://temu.bsc.es/mLivingNER/diff.xhtml#/translations/en/annotation_transfer/train/caso_clinico_radiologia942?dif
f=/gold-standard/train/
NCBI Tax
ID: 11103
NCBI Tax
ID: 11103
NCBI Tax
ID: 1311
NCBI Tax
ID: 1311
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participating teams
● Registrations: 56
● SPECIES NER track: 20
participating teams, 41
submissions
● SPECIES Norm track: 8
teams, 14 submissions
● Clinical Impact track:
5 teams, 6 submissions
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participant results
● MiF: micro-averaged F-score (main metric)
● MiP: micro-avg. Precision
● MiR: micro-avg. Recall
github.com/tonifuc3m/livingner-evaluation-library
SPECIES NER SPECIES Norm
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participant results
● MiF: micro-averaged F-score (main metric)
● MiP: micro-avg. Precision
● MiR: micro-avg. Recall
github.com/tonifuc3m/livingner-evaluation-library
SPECIES NER SPECIES Norm
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER participant results - Clinical Impact track
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
• Increasing interest in Spanish clinical NLP tasks
• LivingNER Resources
○ LivingNER Corpus: Species entity Gold Standard corpus mapped to NCBI Taxonomy.
○ LivingNER Multilingual Silver Standard Corpus: Disease entity corpora normalised to
NCBI Taxonomy in several languages.
○ LivingNER Spanish Silver Standard (from participants’ predictions)
Conclusions
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
• Correct the LivingNER Multilingual Silver Standard to generate a Gold Standard subset
of each language to create high-quality benchmarks in the seven languages.
• Clinical Impact track lacked enough training and test data, and we plan to correct this
issue in the future.
Future directions
● Generate more granular annotations
for the HUMAN mentions that are
needed for real-world applications.
Actual examples of annotated species mentions and automatically
recognized profession mentions.
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Acknowledgements
LivingNER Participants &
LivingNER Scientific Committee
IberLEF organisers
● Manuel
● Julio
● and all others
SEPLN organisers
Funding:
• Plan de Tecnologías del Lenguaje
• AI4PROFHEALTH (PID2020-119266RA-I00)
• BioMATDB Horizon Europe Grant
Agreement No 101058779
BSC Text Mining Unit
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
LivingNER resources
LivingNER corpus: doi.org/10.5281/zenodo.6376662
LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162
LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662
LivingNER terminology: doi.org/10.5281/zenodo.6390506
LivingNER Silver Standard:
LivingNER evaluation library:
github.com/tonifuc3m/livingner-evaluation-library
LivingNER participant systems:
temu.bsc.es/livingner/participant-systems/
LivingNER YouTube playlist:
https://youtube.com/playlist?list=PL5uSCzf1azhA_gMLC3DBZe6NvmMJiggTg
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com
Questions?
IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com

More Related Content

Similar to Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources (talk at IberLEF @ SEPLN 2022)

SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...Martin Krallinger
 
Mansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedINMansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedINColin MANSFIELD
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET
 
Utility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsUtility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsBedirhan Ustun
 
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...AnitaPoudel5
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsNigel Collier
 
2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overviewdvreeman
 
Neuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial ManagerNeuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial ManagerFIBAO
 
Vectors, environment and society unit
Vectors, environment and society unitVectors, environment and society unit
Vectors, environment and society unitvaléry ridde
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral MedicineHarold Slavkin
 
2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WG2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WGdvreeman
 
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...Arvinder Singh
 
2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introduction2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introductiondvreeman
 
Country Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - BangladeshCountry Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - Bangladeshapaari
 
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 MmatchmakingPNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 MmatchmakingRIICCHPeru
 
Knowledge curation for COVID-19
Knowledge curation for COVID-19Knowledge curation for COVID-19
Knowledge curation for COVID-19Sonja Aits
 
State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015Bedirhan Ustun
 
Indo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalIndo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalVishwas Chavan
 
Advanced diagnostic aids in Periodontics
Advanced diagnostic aids in PeriodonticsAdvanced diagnostic aids in Periodontics
Advanced diagnostic aids in PeriodonticsR Viswa Chandra
 

Similar to Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources (talk at IberLEF @ SEPLN 2022) (20)

R Obomsawin CV
R Obomsawin CVR Obomsawin CV
R Obomsawin CV
 
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
SympTEMIST Shared Task on Symptoms, Signs and Findings Detection and Normaliz...
 
Mansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedINMansfield CV 2016 LinkedIN
Mansfield CV 2016 LinkedIN
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
 
Utility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information SystemsUtility and Added Value of Classifications in Health Information Systems
Utility and Added Value of Classifications in Health Information Systems
 
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding, DNA fingerpr...
Conservation Biotechnology: DNA and Tissue Bank, DNA Barcoding , DNA fingerpr...
 
Exploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease InformaticsExploiting NLP for Digital Disease Informatics
Exploiting NLP for Digital Disease Informatics
 
2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview2009 12 07 - LOINC Introduction and Overview
2009 12 07 - LOINC Introduction and Overview
 
Neuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial ManagerNeuron Bio D. Juan M. Alfaro - Commercial Manager
Neuron Bio D. Juan M. Alfaro - Commercial Manager
 
Vectors, environment and society unit
Vectors, environment and society unitVectors, environment and society unit
Vectors, environment and society unit
 
Personalized Oral Medicine
Personalized Oral MedicinePersonalized Oral Medicine
Personalized Oral Medicine
 
2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WG2012 03 20 - LOINC Introduction - AMIA KRS-WG
2012 03 20 - LOINC Introduction - AMIA KRS-WG
 
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...Rapid Impact Assessment of Climatic and Physio-graphic Changes  on Flagship G...
Rapid Impact Assessment of Climatic and Physio-graphic Changes on Flagship G...
 
2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introduction2010 12 06 - LOINC Introduction
2010 12 06 - LOINC Introduction
 
Country Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - BangladeshCountry Status Reports on Agricultural Biotechnology - Bangladesh
Country Status Reports on Agricultural Biotechnology - Bangladesh
 
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 MmatchmakingPNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
PNCTI Valoración de la Biodiversidad CONCYTEC 2014 COP 20 Mmatchmaking
 
Knowledge curation for COVID-19
Knowledge curation for COVID-19Knowledge curation for COVID-19
Knowledge curation for COVID-19
 
State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015State of the WHO Family of International Classifications -2015
State of the WHO Family of International Classifications -2015
 
Indo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_finalIndo norway delhi_vishwas_28_oct2011_final
Indo norway delhi_vishwas_28_oct2011_final
 
Advanced diagnostic aids in Periodontics
Advanced diagnostic aids in PeriodonticsAdvanced diagnostic aids in Periodontics
Advanced diagnostic aids in Periodontics
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources (talk at IberLEF @ SEPLN 2022)

  • 1. Barcelona Supercomputing Center (BSC): • Antonio Miranda-Escalada • Luis Gascó • Salvador Lima-López • Eulàlia Farré-Maduell • Darryl Estrada • Martin Krallinger Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources Martin Krallinger Head of Text Mining Unit, BSC <mkrallin@bsc.es> IberLEF @ SEPLN 2022 LivingNER corpus: doi.org/10.5281/zenodo.6376662 1
  • 2. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Importance of species information extraction De − Allice Hunter - File:Hispanophone global world map language.png,CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=69323596 National Center for Biotechnology Information (NCBI) Taxonomy
  • 3. How many species inhabit the earth How many species do we know Quantification of global species richness Taxonomic classification of species Number of species in a taxonomic group Validation against well-known taxa 250 years of taxonomic classification 1.2 million species catalogued in a central database 86% of species on Earth and 91% of species in the ocean still await description Knowledge gap
  • 4. -Large collection of species, change over time, hierarchical relation types relation types -Homonymy with commonly used words, e.g.: “Spot” (Leiostomus xanthurus) and “Permit” (Trachinotus falcatus) -Homonymy with other medical entities (the word “goat” can refer to proteins found in human, zebrafish, rat and mouse. -Abbreviations are ambiguous, e.g.: HBV can be used for both “Hepatitis B virus” as well as “Hepatitis B vaccine” -Vernacular form (common names) - Incorrect case or misspelt (like, Bacterium coli, Bacillus coli and Escheria coli for Escherichia coli) - Coordinations, nested expressions: “human immunodeficiency viruses types 1 and 2”, refer to two distinct species names, “HIV type 1” and “HIV type 2” - Role names (e.g. athletes, responders) - Human mencions in the form of family members, etc…. Challenges
  • 5. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Previous SPECIES extraction and normalization efforts ● LivingNER < 2000 2000-2010 2010-2021 2022 ● The Catalogue of Life [Index of the world's species] [Bánki et al., 2022] [2001] ●Infectious Diseases (ID) task of BioNLP [Corpus and shared task] [Pyysalo et al., 2011] [2011] ● SPECIES [Species mention and normalisation to NCBI taxonomy corpus and tool] [Pafilis et al., 2013] [2014] ● ITIS (Integrated Taxonomic Information System) [Federal effort to provide consistent biological taxonomies] [1996] ● NCBI taxonomy [Terminological resource] [Federhen, 2012] [1997] ● Global Names Architecture database [organizes and cross-links electronic information about organisms] [Pyle et al., 2016] [2016] ● LINNAEUS [Species mention and normalisation to NCBI taxonomy corpus and tool] [Gerner et al., 2010] [2010]
  • 6. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER overview
  • 7. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER resources LivingNER corpus: doi.org/10.5281/zenodo.6376662 LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162 LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662 LivingNER terminology: doi.org/10.5281/zenodo.6390506 LivingNER Silver Standard: LivingNER evaluation library: github.com/tonifuc3m/livingner-evaluation-library LivingNER participant systems: temu.bsc.es/livingner/participant-systems/ LivingNER YouTube playlist: https://www.youtube.com/channel/UCDsmS1pCCO8TW312wJq8aCQ/playlists
  • 8. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER Corpus: documents, format and annotation
  • 9. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER Corpus - Overview ● Diversity: Atención primaria, dermatología, medicina interna, medicina tropical, endocrinología, neurología, oftalmología, psiquiatría, radiología, urgencias, cardiología, pediatrita, oncología, odontología,.. ● Manual entity annotations, NCBI taxonomy mapping and application classification ● Inter-Annotator Agreement (IAA): 94.2 ● Random training, validation and test split Most common SPECIES mentions
  • 10. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Multilingual Silver Standard
  • 11. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com DisTEMIST Multilingual Silver Standard Spanish Gold Standard English Silver Standard Online visualiser: https://temu.bsc.es/mLivingNER/diff.xhtml#/translations/en/annotation_transfer/train/caso_clinico_radiologia942?dif f=/gold-standard/train/ NCBI Tax ID: 11103 NCBI Tax ID: 11103 NCBI Tax ID: 1311 NCBI Tax ID: 1311
  • 12. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participating teams ● Registrations: 56 ● SPECIES NER track: 20 participating teams, 41 submissions ● SPECIES Norm track: 8 teams, 14 submissions ● Clinical Impact track: 5 teams, 6 submissions
  • 13. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participant results ● MiF: micro-averaged F-score (main metric) ● MiP: micro-avg. Precision ● MiR: micro-avg. Recall github.com/tonifuc3m/livingner-evaluation-library SPECIES NER SPECIES Norm
  • 14. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participant results ● MiF: micro-averaged F-score (main metric) ● MiP: micro-avg. Precision ● MiR: micro-avg. Recall github.com/tonifuc3m/livingner-evaluation-library SPECIES NER SPECIES Norm
  • 15. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER participant results - Clinical Impact track
  • 16. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com • Increasing interest in Spanish clinical NLP tasks • LivingNER Resources ○ LivingNER Corpus: Species entity Gold Standard corpus mapped to NCBI Taxonomy. ○ LivingNER Multilingual Silver Standard Corpus: Disease entity corpora normalised to NCBI Taxonomy in several languages. ○ LivingNER Spanish Silver Standard (from participants’ predictions) Conclusions
  • 17. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com • Correct the LivingNER Multilingual Silver Standard to generate a Gold Standard subset of each language to create high-quality benchmarks in the seven languages. • Clinical Impact track lacked enough training and test data, and we plan to correct this issue in the future. Future directions ● Generate more granular annotations for the HUMAN mentions that are needed for real-world applications. Actual examples of annotated species mentions and automatically recognized profession mentions.
  • 18. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Acknowledgements LivingNER Participants & LivingNER Scientific Committee IberLEF organisers ● Manuel ● Julio ● and all others SEPLN organisers Funding: • Plan de Tecnologías del Lenguaje • AI4PROFHEALTH (PID2020-119266RA-I00) • BioMATDB Horizon Europe Grant Agreement No 101058779 BSC Text Mining Unit
  • 19. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com LivingNER resources LivingNER corpus: doi.org/10.5281/zenodo.6376662 LivingNER annotation guidelines: doi.org/10.5281/zenodo.6385162 LivingNER Multilingual Silver Standard: doi.org/10.5281/zenodo.6376662 LivingNER terminology: doi.org/10.5281/zenodo.6390506 LivingNER Silver Standard: LivingNER evaluation library: github.com/tonifuc3m/livingner-evaluation-library LivingNER participant systems: temu.bsc.es/livingner/participant-systems/ LivingNER YouTube playlist: https://youtube.com/playlist?list=PL5uSCzf1azhA_gMLC3DBZe6NvmMJiggTg
  • 20. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com Questions?
  • 21. IberLEF - LivingNER: recognition, normalization & classification of species, pathogens and food - krallinger.martin@gmail.com; antoniomiresc@gmail.com