UMLS is a metathesaurus that facilitates the development of computer systems that behave as if they "understand"
the meaning of the language of biomedicine
and health. It comprises a controlled vocabulary, semantic network and specialist lexicon and lexical tools. MetaMap is a tool for recognizing UMLS concepts in text
3. Motivation
“... to facilitate the development of computer
systems that behave as if they "understand"
the meaning of the language of biomedicine
and health.”
National Library of Medicine
4. UMLS Components
1.Metathesaurus
+1 Million biomedical concepts from over 100 vocabularies
2..Semantic Network
133 categories & 54 relationships.
3..Specialist Lexicon & Lexical Tools
Software programs to aid in NLP
5. Meta thesaurus
Patient Care Controlled Terms
Biomedical Vocabs from
Different
Languages
Clinical/Health Services Research
Health Services Billing
Biomedical Literature Catalogs
Public Health Statistics
.
.
.
.
.
5,000,000biomedicalterm
1,000,000Concepts
+ 100 Source Vocabs
Relational DB Tables
6. Metathesaurus
●Concepts are classified into categories:
–Diagnosis
–Procedures & Supplies
–Diseases
–….
●Concepts have unique identifier.
●Concepts have preferred terms.
●Concepts can be grouped into subsets via applying
filters.
10. Unique Identifiers
● Concept Unique Identifier (CUI)
Link all the names in all the source vocabs that mean the same
to one concept and assign a unique identifier, CUI, to it.
● Lexical Unique Identifier (LUI)
Are lexical variants for the concepts detected using Lexical
Variant Generator (LVG) program.
● String Unique Identifier (SUI)
Represents variations in the char set, upper-lower case, or
permutation difference.
● Atom Unique Identifier (AUI)
Every occurrence of a string in each source vocab is assigned a
unique identifier, AUI.
11.
12. Semantic Network
● Semantic Types
+133 types, each MT concept assigned one semantic type at
least.
● Semantic Relationships
54 relationaship. Is-A is the most important.
13. Semantic Network
Semantic Types Examples:
✔ Organisms
✔ Anatomical structures
✔ Biologic function
✔ Chemicals
✔ Physical objects
Entity
Event
Semantic Relationships Examples:
✔ Physically related to
✔ Spatially related to
✔ Temporally related to
✔ Functionally related to
✔ Conceptually related to
14. Lexical Tools
●The Specialist Lexicon
Is an English lexicon (dictionary) that includes over 200,000
biomedical terms from a variety of source to aid in NLP.
●Lexical Variant Generator (LVG)
●Norm
Normalizer
●Wordind
Tokenizer
16. Why Concept Identification?
● Information extraction/Data mining
● Classification/Categorization
● Text summarization
● Question answering
● Literature-based Knowledge Discovery
17. Example
Phrase: “lung cancer.”
Meta Candidates (8):
1000 Lung Cancer {MDR,DXP} (Malignant neoplasm of lung) [Neoplastic Process]
1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]
861 Cancer (Malignant Neoplasms) [Neoplastic Process]
861 Lung [Body Part, Organ, or Organ Component]
861 Cancer (Cancer Genus) [Invertebrate]
861 Lung (Entire lung) [Body Part, Organ, or Organ Component]
861 Cancer (Specialty Type - cancer) [Biomedical Occupation or
Discipline]
768 Pneumonia [Disease or Syndrome]
Meta Mapping (1000):
1000 Lung Cancer (Carcinoma of lung) [Neoplastic Process]
Meta Mapping (1000):
1000 Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]
19. MetaMap Options
● Word Sense Disambiguation (-y)
Determines which concept is the best
choice using surrounding context.
● Negation (--negx)
Identifies negated entities.
20. Examples
●
WSD Examples
–“Fifteen (6.4%) of 234 colds treated with placebo ..”
●
Cold (cold temperature) [npop]
●
Cold (Common cold) [dsyn]
●
Cold (Cold Sensation) [phsf]
–“.. the drugs were compared in two four-point, double-blind
bioassays.”
●
Double (Diplopia) [dsyn] vs. Double (Duplicate) [ftcn]
●
Blind (Blind Vision) [dsyn] vs. BLIND (Blinded) [reasa] vs. Blind (Visually
impaired persons) [podg]
●
Bioassays (Biological Assay) [lbpr]
21. Examples
● Negation Example
– “There is no focal infiltrate or pleural effusion.”
– --negex output(in addition to normal output):
NEGATIONS:
Negation Type:nega
Negation Trigger: no
Negation PosInfo: 9/2
Negated Concept: C0332448:Infiltrate
Concept PosInfo: 18/10
Negation Type:nega
Negation Trigger: no
Negation PosInfo: 9/2
Negated Concept: C2073625:pleural effusion, C0032227:Pleural Effusion
Concept PosInfo: 32/16
22. Other Options
●
-@ --WSD <hostname> : Which WSD server to use.
●
-8 --dynamic_variant_generation : dynamic variant generation
●
-D --all_derivational_variants : all derivational variants
●
-J --restrict_to_sts <semtypelist> : restrict to semantic types
●
-K --ignore_stop_phrases : ignore stop phrases.
●
-R --restrict_to_sources <sourcelist> : restrict to sources
●
-V --mm_data_version <name> : version of MetaMap data to use.
●
-X --truncate_candidates_mappings : truncate candidates mapping
●
-Y --prefer_multiple_concepts : prefer multiple concepts
●
-Z --mm_data_year <name> : year of MetaMap data to use.
●
-a --all_acros_abbrs : allow Acronym/Abbreviation variants
●
-b --compute_all_mappings : compute/display all mappings
●
-d --no_derivational_variants : no derivational variants
●
-e --exclude_sources <sourcelist> : exclude semantic types
●
-g --allow_concept_gaps : allow concept gaps
● -i --ignore_word_order : ignore word order
●
-k --exclude_sts <semtypelist> : exclude semantic types
●
-o --allow_overmatches : allow overmatches
●
-r --threshold <integer> : Threshold for displaying candidates.
●
29. MetaMap: Technical Aspect
●
Download
–MetaMap API Underlying Architecture.
–MetaMap Java API.
●Extract and Install
–$ bzip2 -dc public_mm_linux_javaapi_{four-digit-year}.tar.bz2 | tar xvf -
–$ ./bin/install.sh
●
Starting MetaMap Server
$ ./bin/skrmedpostctl start #Start SKR Server
$ ./bin/wsdserverctl start #Start WSD Server (Optional)
$ ./bin/mmserver{two-digit-year} #Start MetaMap Server
30. MetaMap Java API
Two jar files contain the API:
✔ /src/javaapi/dist/MetaMapApi.jar
✔ /src/javaapi/dist/prologbeans.jar
31. Code Time :)
MetaMapApi api = new MetaMapApiImpl("localhost");
List<Result> resultList =
api.processCitationsFromFile("Abstract.txt");
Result result = resultList.get(0);