SlideShare a Scribd company logo
1 of 16
Biomedical
Entity Linking
Introduction,
approaches, challenges
About me
● PhD in machine learning & natural language processing
from University of Bonn & Fraunhofer IAIS
● Now in industry: AI and data driven products, since 2016
mostly in the medical and healthcare domain
● Main interests: NLP, especially German; information
retrieval; recommender systems
@anja_pilz
aplz
Outline
● Motivation: why do we need entity linking
○ Ambiguity, use cases
● Entity linking in the biomedical domain
○ Data and ontologies, main challenges
● Technical problem: 3 stage task
○ Approaches for each of these stages (sketches & references)
● Short preview of challenges with German data
Language is ambiguous
“with steroid induced diabetes, I lost a stone
in three days, it was grim”
Type II diabetes
Type I diabetes
Gestational
diabetes
Steroid diabetes
gallstones
kidney stones
the stone
(unit)
a random
stone
grim protein, Drosophila
variants of
steroids
Why do we care?
Need to resolve ambiguity to
● avoid mistakes in patient-doctor communication
○ specialist vs layman vocabulary
● automatically retrieve important information
○ side effects of drugs discussed in online patient fora
● enrich electronic health records (EHR)
○ links to newest research, treatment guidelines or other LOD resources
And many more reasons…
Entity linking resolves ambiguity by assigning each mention its underlying “sense”.
Headache
Cephalgia
Entities: entries in (curated), medical ontologiesMentions: textual references of medical terms like
diagnoses, treatments, body parts, drugs, ...
Biomedical Entity Linking
Migraine
Head Pain
Cranial Pain
Headache
(D006261)
layman
terms
EHR
specialist
vocabulary
Example: excerpt from a PubMed abstract linked to UMLS (Unified Medical Language
System)
Biomedical Entity Linking
Mohan & Li, MedMentions: A Large Biomedical Corpus
Annotated with UMLS Concepts, AKBC 2019
The technique does not
require contrast material, so
it can safely be used in
patients with renal failure.
Why is that hard?
● Notion of uniqueness: a disease is
rendered unique by the person it affects
(and the stage)
● Uniqueness heavily affects linkability:
which stage of renal failure is meant?
○ candidates “look” super similar
○ might even need additional resources (lab)
Acute renal failure: Her baseline Cr is
1.8. On presentation the Cr had
increased to 7.7 secondary to the
bilateral hydronephrosis.
https://icd.who.int/browse11/l-m/en
Johnson et al., MIMIC-III, a freely accessible
critical care database. Scientific Data 2016
Given some text document, find all spans of words m that mention some entity e and
assign each span to a unique identifier (entry in a KB).
Technical Problem
Entity Recognition: detect spans to be linked
(Sequence Tagging)
Candidate Retrieval: find all relevant candidates in a KB
(Information Retrieval)
Candidate Ranking: decide on the best candidate
(Ranking Task)
Errorpropagation
Step 1: Entity Recognition
Goal: detect diagnoses, measurements, procedures in the text of the EHR
● supervised: train a sequence tagging model
○ pick a model: lots of literature but mostly sth Bi-LSTM CRF
○ (manually) annotate data
● pro: domain adaptation & custom features
● con: requires training data & medical expertise
Roller et al., Detecting Named Entities and Relations
in German Clinical Reports, GSCL 2017
Murty et al., Hierarchical Losses and New
Resources for Fine-grained Entity Typing and
Linking, ACL 2018
Lampe et al., Neural Architectures for Named Entity
Recognition. NAACL-HLT 2016
Indication: Acute hypoxia. Relapsed AML,
GVHD, and renal failure with new hypoxia with
clear chest x-ray.
Step 1: Entity Recognition
Goal: detect diagnoses, measurements, procedures in the text of the EHR
● weakly labeled: keyword matching
○ walk over text and lookup every span in a dictionary
○ keep all spans that have at least one entity candidate
● pro: no need to annotate data
● con: noise, type and recall issues
Murty et al., Hierarchical Losses and New
Resources for Fine-grained Entity Typing and
Linking, ACL 2018
Kolitsas et al., End-to-End Neural Entity Linking,
CoNLL 2018
Wiatrak, Iso-Sipilä. Simple Hierarchical Multi-Task
Neural End-To-End Entity Linking for Biomedical
Text. LOUHI@EMNLP 2020
Indication: Acute hypoxia. Relapsed AML,
GVHD, and renal failure with new hypoxia with
clear chest x-ray.
Step 2: Candidate Retrieval
Goal: fetch all relevant candidate entities from the ontology
● upper bound on performance: you can’t link what you
don’t find
GoTo solution: inverted index (lucene) over entity descriptions
● make use of the analyzers coming with lucene for
tokenization, stemming, etc
● craft search query from the mention context
● keep top 5, 10, 100 hits as candidates
Pilz & Paaß, Collective Search for Concept
Disambiguation, COLING 2012
Step 3: Candidate Ranking
Goal: decide on the best candidate as target entity
Rank by context similarity
● compare text representations of mention context and
entity description (word2vec, topic distributions, etc)
● but: medical ontologies do often not provide extensive
descriptions
Pilz & Paaß, From names to entities using thematic
context distance, CIKM 2011
Step 3: Candidate Ranking
Goal: decide on the best candidate as target entity
Add type similarity from hierarchies
● Wikipedia: categories assigned to entities
● UMLS: use semantic types
○ distinguish disease form the gene its caused by
○ LATTE: find boost in linking performance when adding type
encoding learned from UMLS types
Zhu et al., LATTE: Latent Type Modeling for
Biomedical Entity Linking, AAAI 2020
UMLS® Reference Manual
Step 3: Candidate Ranking
Goal: decide on the best candidate as target entity
In a nutshell
● find expressive vector representations of mention-candidate pairs
● plug vectors into some function to rank them
○ Ranking SVM, specific loss functions in NN, …
● the information in the vector is more important than the algorithm!
Challenges with German data
● Data is scarce, nothing comparable to MIMIC-III or MedMentions exists
● Ontologies like UMLS are only available in English
● NLP for German is a tad harder
○ Common nouns look like named entities (upper case)
● … the notorious compound words
○ sensory sensation disorder: Schallempfindungsstörung
○ occlusion of the central retinal artery: Netzhautarterienverschluss
Ideas?
Let’s discuss!

More Related Content

What's hot

Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Edureka!
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingIla Group
 
hospital management system.docx
hospital management system.docxhospital management system.docx
hospital management system.docxNikhil Patil
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Artificial Intelligence with Python | Edureka
Artificial Intelligence with Python | EdurekaArtificial Intelligence with Python | Edureka
Artificial Intelligence with Python | EdurekaEdureka!
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
 
Artificial intelligence in healthcare
Artificial intelligence in healthcareArtificial intelligence in healthcare
Artificial intelligence in healthcare121Omkar
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overviewalessio_ferrari
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Jimmy Lai
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonGrammarly
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...Databricks
 
An introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable serviceAn introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable serviceJisc
 
AI in Healthcare: Defining New Health
AI in Healthcare: Defining New HealthAI in Healthcare: Defining New Health
AI in Healthcare: Defining New HealthKumaraguru Veerasamy
 

What's hot (20)

NLP
NLPNLP
NLP
 
Deep learning and Healthcare
Deep learning and HealthcareDeep learning and Healthcare
Deep learning and Healthcare
 
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Tra...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
hospital management system.docx
hospital management system.docxhospital management system.docx
hospital management system.docx
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Artificial Intelligence with Python | Edureka
Artificial Intelligence with Python | EdurekaArtificial Intelligence with Python | Edureka
Artificial Intelligence with Python | Edureka
 
Text MIning
Text MIningText MIning
Text MIning
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Artificial intelligence in healthcare
Artificial intelligence in healthcareArtificial intelligence in healthcare
Artificial intelligence in healthcare
 
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an OverviewNatural Language Processing (NLP) for Requirements Engineering (RE): an Overview
Natural Language Processing (NLP) for Requirements Engineering (RE): an Overview
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
5. phase of nlp
5. phase of nlp5. phase of nlp
5. phase of nlp
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
Deep Learning for Domain-Specific Entity Extraction from Unstructured Text wi...
 
An introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable serviceAn introduction to Jupyter notebooks and the Noteable service
An introduction to Jupyter notebooks and the Noteable service
 
AI in Healthcare: Defining New Health
AI in Healthcare: Defining New HealthAI in Healthcare: Defining New Health
AI in Healthcare: Defining New Health
 

Similar to Biomedical Entity Linking - Introduction, approaches, challenges

Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocYu Liu
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsTim Clark
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Recordkingstdio
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...IRJET Journal
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Benjamin Good
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Fondazione Giannino Bassetti
 
Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big DataKouji Kozaki
 
Strengths and Weakness of Informatics.docx
Strengths and Weakness of Informatics.docxStrengths and Weakness of Informatics.docx
Strengths and Weakness of Informatics.docxwrite5
 
Knowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsKnowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsCatia Pesquita
 
Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsAshis Chanda
 
ALA 2010 -- Jabin White
ALA 2010 -- Jabin WhiteALA 2010 -- Jabin White
ALA 2010 -- Jabin Whitebisg
 
Extreme scale text based classification of medical data
Extreme scale text based classification of medical dataExtreme scale text based classification of medical data
Extreme scale text based classification of medical dataSvetlaBoytcheva
 
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...SvetlaBoytcheva
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Health Informatics New Zealand
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical DataPaul Agapow
 
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...Dr. Haxel Consult
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxssuser6b571f
 
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...CASI, Arizona State University
 

Similar to Biomedical Entity Linking - Introduction, approaches, challenges (20)

Introduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDocIntroduction to NTCIR 2016 MedNLPDoc
Introduction to NTCIR 2016 MedNLPDoc
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Record
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
Disease detection for multilabel big dataset using MLAM, Naive Bayes, Adaboos...
 
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
Poster: Microtask crowdsourcing for disease mention annotation in PubMed abst...
 
Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...Big data and machine learning: opportunità per la medicina di precisione e i ...
Big data and machine learning: opportunità per la medicina di precisione e i ...
 
Ontology Engineering for Big Data
Ontology Engineering for Big DataOntology Engineering for Big Data
Ontology Engineering for Big Data
 
Strengths and Weakness of Informatics.docx
Strengths and Weakness of Informatics.docxStrengths and Weakness of Informatics.docx
Strengths and Weakness of Informatics.docx
 
Knowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applicationsKnowledge Science for AI-based biomedical and clinical applications
Knowledge Science for AI-based biomedical and clinical applications
 
Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methods
 
ALA 2010 -- Jabin White
ALA 2010 -- Jabin WhiteALA 2010 -- Jabin White
ALA 2010 -- Jabin White
 
Extreme scale text based classification of medical data
Extreme scale text based classification of medical dataExtreme scale text based classification of medical data
Extreme scale text based classification of medical data
 
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
II-SDV 2013 Text Mining at Work: Critical Assessment of the Completeness and ...
 
ai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptxai-in-healthcare-202011-201117103639.pptx
ai-in-healthcare-202011-201117103639.pptx
 
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
G. Poste. Managing the Data Deluge: Critical Issues in the Integration and An...
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Biomedical Entity Linking - Introduction, approaches, challenges

  • 2. About me ● PhD in machine learning & natural language processing from University of Bonn & Fraunhofer IAIS ● Now in industry: AI and data driven products, since 2016 mostly in the medical and healthcare domain ● Main interests: NLP, especially German; information retrieval; recommender systems @anja_pilz aplz
  • 3. Outline ● Motivation: why do we need entity linking ○ Ambiguity, use cases ● Entity linking in the biomedical domain ○ Data and ontologies, main challenges ● Technical problem: 3 stage task ○ Approaches for each of these stages (sketches & references) ● Short preview of challenges with German data
  • 4. Language is ambiguous “with steroid induced diabetes, I lost a stone in three days, it was grim” Type II diabetes Type I diabetes Gestational diabetes Steroid diabetes gallstones kidney stones the stone (unit) a random stone grim protein, Drosophila variants of steroids
  • 5. Why do we care? Need to resolve ambiguity to ● avoid mistakes in patient-doctor communication ○ specialist vs layman vocabulary ● automatically retrieve important information ○ side effects of drugs discussed in online patient fora ● enrich electronic health records (EHR) ○ links to newest research, treatment guidelines or other LOD resources And many more reasons… Entity linking resolves ambiguity by assigning each mention its underlying “sense”.
  • 6. Headache Cephalgia Entities: entries in (curated), medical ontologiesMentions: textual references of medical terms like diagnoses, treatments, body parts, drugs, ... Biomedical Entity Linking Migraine Head Pain Cranial Pain Headache (D006261) layman terms EHR specialist vocabulary
  • 7. Example: excerpt from a PubMed abstract linked to UMLS (Unified Medical Language System) Biomedical Entity Linking Mohan & Li, MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts, AKBC 2019 The technique does not require contrast material, so it can safely be used in patients with renal failure.
  • 8. Why is that hard? ● Notion of uniqueness: a disease is rendered unique by the person it affects (and the stage) ● Uniqueness heavily affects linkability: which stage of renal failure is meant? ○ candidates “look” super similar ○ might even need additional resources (lab) Acute renal failure: Her baseline Cr is 1.8. On presentation the Cr had increased to 7.7 secondary to the bilateral hydronephrosis. https://icd.who.int/browse11/l-m/en Johnson et al., MIMIC-III, a freely accessible critical care database. Scientific Data 2016
  • 9. Given some text document, find all spans of words m that mention some entity e and assign each span to a unique identifier (entry in a KB). Technical Problem Entity Recognition: detect spans to be linked (Sequence Tagging) Candidate Retrieval: find all relevant candidates in a KB (Information Retrieval) Candidate Ranking: decide on the best candidate (Ranking Task) Errorpropagation
  • 10. Step 1: Entity Recognition Goal: detect diagnoses, measurements, procedures in the text of the EHR ● supervised: train a sequence tagging model ○ pick a model: lots of literature but mostly sth Bi-LSTM CRF ○ (manually) annotate data ● pro: domain adaptation & custom features ● con: requires training data & medical expertise Roller et al., Detecting Named Entities and Relations in German Clinical Reports, GSCL 2017 Murty et al., Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking, ACL 2018 Lampe et al., Neural Architectures for Named Entity Recognition. NAACL-HLT 2016 Indication: Acute hypoxia. Relapsed AML, GVHD, and renal failure with new hypoxia with clear chest x-ray.
  • 11. Step 1: Entity Recognition Goal: detect diagnoses, measurements, procedures in the text of the EHR ● weakly labeled: keyword matching ○ walk over text and lookup every span in a dictionary ○ keep all spans that have at least one entity candidate ● pro: no need to annotate data ● con: noise, type and recall issues Murty et al., Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking, ACL 2018 Kolitsas et al., End-to-End Neural Entity Linking, CoNLL 2018 Wiatrak, Iso-Sipilä. Simple Hierarchical Multi-Task Neural End-To-End Entity Linking for Biomedical Text. LOUHI@EMNLP 2020 Indication: Acute hypoxia. Relapsed AML, GVHD, and renal failure with new hypoxia with clear chest x-ray.
  • 12. Step 2: Candidate Retrieval Goal: fetch all relevant candidate entities from the ontology ● upper bound on performance: you can’t link what you don’t find GoTo solution: inverted index (lucene) over entity descriptions ● make use of the analyzers coming with lucene for tokenization, stemming, etc ● craft search query from the mention context ● keep top 5, 10, 100 hits as candidates Pilz & Paaß, Collective Search for Concept Disambiguation, COLING 2012
  • 13. Step 3: Candidate Ranking Goal: decide on the best candidate as target entity Rank by context similarity ● compare text representations of mention context and entity description (word2vec, topic distributions, etc) ● but: medical ontologies do often not provide extensive descriptions Pilz & Paaß, From names to entities using thematic context distance, CIKM 2011
  • 14. Step 3: Candidate Ranking Goal: decide on the best candidate as target entity Add type similarity from hierarchies ● Wikipedia: categories assigned to entities ● UMLS: use semantic types ○ distinguish disease form the gene its caused by ○ LATTE: find boost in linking performance when adding type encoding learned from UMLS types Zhu et al., LATTE: Latent Type Modeling for Biomedical Entity Linking, AAAI 2020 UMLS® Reference Manual
  • 15. Step 3: Candidate Ranking Goal: decide on the best candidate as target entity In a nutshell ● find expressive vector representations of mention-candidate pairs ● plug vectors into some function to rank them ○ Ranking SVM, specific loss functions in NN, … ● the information in the vector is more important than the algorithm!
  • 16. Challenges with German data ● Data is scarce, nothing comparable to MIMIC-III or MedMentions exists ● Ontologies like UMLS are only available in English ● NLP for German is a tad harder ○ Common nouns look like named entities (upper case) ● … the notorious compound words ○ sensory sensation disorder: Schallempfindungsstörung ○ occlusion of the central retinal artery: Netzhautarterienverschluss Ideas? Let’s discuss!