ResQu: A Framework for Automatic Evaluation of Knowledge-Driven Automatic Summarization

RESQU: A FRAMEWORK FOR AUTOMATIC EVALUATION OF
KNOWLEDGE-DRIVEN AUTOMATIC SUMMARIZATION
MASTERS THESIS DEFENSE
NISHITA JAYKUMAR
MAY 26, 2016
MASTERS COMMITTEE
AMIT P. SHETH (ADVISOR)
THOMAS C. RINDFLESCH (NIH)
DELROY CAMERON (APPLE INC.)
KRISHNAPRASAD THIRUNARAYAN
1

Main Issue: Indirect Information access
PubMed Search Service
2

3
Acetaminophen TREATS Migraine Disorders
Sumatriptan TREATS Migraine Disorder
Topiramate PREVENTS Migraine Disorders
More direct Information access
Semantic MEDLINE
3

Thesis Motivation
• Automatically evaluate summaries in Semantic MEDLINE.
• Identify features that impact summary quality.
• Improve semantic summaries it generates.
4

Outline
• Automatic summarization
- Extractive, abstractive
- Summarization in semantic MEDLINE and ResQu
• Automatic summarization evaluation
- Intrinsic, extrinsic
• Datasets
- UMLS, SemRep, MetaMap
• Approach
- Summary transformation
- Semantic similarity
• Experimental evaluation
• Conclusion
5

• What is an effective summary?
- Saliency
- Compressed format
• Approaches to Automatic Summarization
Automatic Summarization
Extractive Abstractive
6
Extractive summary
A randomized, placebo-controlled trial of
acetaminophen for treatment of migraine
headache.
Long-term evaluation of sumatriptan and
naproxen sodium for the acute treatment of
migraine in adolescents.
…………….
Mapping from disease-specific measures to
health-state utility values in individuals with
migraine.
Abstractive summary
Sumatriptan TREATS Migraine Disorders
…………….
Migraine Disorders PROCESS_OF Individuals

Semantic MEDLINE Summarization System Overview
Source
Documents
Conceptual
Representation
Conceptual
Condensate
Semantic
Predications
Semantic
Predications
Feature application:
• Relevance
• Connectivity
• Novelty
• Saliency
Interpretation Transformation
Reduction
Generalization
SemRep
Semantic
Summary
Generalization
7
Aspirin TREATS Coronary artery
disease
Coronary artery disease
COEXISTS_WITH Inflammation
Coronary artery disease ISA
Vascular disease
tomography DIAGNOSIS Coronary
artery disease

Intrinsic Evaluation:
- Compared to a human-curated gold standard.
- Using document similarity measures.
• Evaluating Summary Quality
Evaluating Summaries
Extrinsic evaluation:
- Based on a secondary task.
- Through a discrete scoring system.
8

Intrinsic Evaluation of Extractive Summariztion
• Pyramid Approach [Nenkova et al., 2004]
- Summary Content Units (SCU)
• Louis et al [2009]
• Distribution of terms
• Kullback-Liebler
• Jensen-Shannon
Nenkova, Ani, and Rebecca Passonneau. "Evaluating content selection in summarization: The pyramid method."
(2004). Louis, Annie, and Ani Nenkova. "Automatic summary evaluation without human models." Notebook
Papers and Results, Text Analysis Conference (TAC-2008), Gaithersburg, Maryland (USA). 2008.
9

• Information Misalignment
• Semantic summary – structured background knowledge.
• Gold standard – textual.
• Proposed Solution
• Summary transformation: predications to text.
• Semantic similarity computation.
Intrinsic Evaluation of Abstractive Summarization
10

Approach: ResQu
We can use the words that co-occur with the semantic predications in a summary to represent
the meaning of the semantic predications based on distributional semantics.
By generating multiple summaries with features held-out, we can effectively evaluate the impact
of each feature.
Word Co-occurrence
Leave-one-Out
11

A semantic summary can be understood and potentially improved by leveraging distributional
statistics between the structured knowledge that comprises the semantic summary and the
words with which these structured constructs co-occur, across the corpus.
Thesis Statement
12
3

valproic acid TREAT migraine
Sumatriptan TREATS Migraine Disorders
lamotrigine TREATS Migraine with Aura
Dihydroergotamine TREATS Migraine Disorders
Aspirin TREATS Migraine Disorders
zolmitriptan TREATS Migraine Disorders
eletriptan TREATS Migraine Disorders
Analgesics TREATS Migraine Disorders
ziconitide TREATS Migraine Disorders
Semantic Predications
…
…
Proposed Solution (ResQu)
Co-occurring arguments
Semantic summary
vector
13
…

• Similarity between SS and GS
- Cosine similarity, Euclidean distance, Jensen-Shannon divergence
• Root Mean-Squared Error
• For each summary generated with a feature held-out
Measuring Similarity
The summary that is least similar to the gold standard has the most important feature.
14
6

Assertional
Knowledge
Definitional
Knowledge
ComplementaryDisjoint
65 Attributes:
62 Provenance Metadata 3
Semantic Attributes
MEDLINE
(1865 – 2015)
Largest Biomedical
Knowledgebase,
>25 million abstracts,
PubMed, PMC
Semantic Predications
Medical Subject Headings (MeSH)
15 Unique Trees, Max Depth – 15
~27,000 Terms
SPECIALIST Lexicon
Semantic Network
Metathesaurus
>300k concepts
>100 Vocabularies
9 million triples
134 Types
15 Groups
54 predicates
Unified Medical Language System (UMLS)
MeSH Indexing
d1
d2
d3
dn
Resource-Rich
Biomedical Knowledge
15
1

ResQu System Architecture
User Query
Processor
Document
Selector
Predication
Mapper
Concept
Mapper
Summarizer
(Schema
Summarizer)
Vectorizer
Predication
Extractor
(SemRep)
Graph
Generator
ResQu
Summary
Vectors
MEDLINE
15
Jericho Crawler
Gold standard
Vectors
Similarity
Computation
Module
Gold standard
creation module

User Query
• l: label of an entity (or concept) in the UMLS,
- Migraine Disorders: C0149931
• c1: Humans[MH] and c2: Clinical Trial [PTYP]
• dt: the date range of documents
• ub: is the upper bound (default = 5000)
q = (l, c1, c2, dt, ub)
17
8

q = (Migraine Disorders[MH] AND Humans[MH] AND Clinical Trial
[PTYP] AND 1860/01:2014/08[DCOM])
User Query Instance
18
9

• Query from the User Query Processor.
• Retrieves the set of MEDLINE documents.
• D = {d1; d2;. . . ; dn}
• Uses the MEDLINE Entrez Search API.
Document Selection
20

Semantic Predications Extractor
22
A randomized, placebo-controlled trial of acetaminophen for
treatment of migraine disorders
Acetaminophen Migraine disorders
treats

Automatic Summarizer
Inflammation mediated by the immune system is known to be important in carcinogenesis and, specifically, T helper 17 cells have been reported to play a role in tumor
progression by promoting neo-angiogenesis. The aim of this study was to investigate whether inflammatory cytokines and vascular endothelial growth factor (VEGF) levels
in exhaled breath condensate (EBC) and in serum were related to tumor size in patients with non-small cell lung cancer (NSCLC). Il-6, IL-17, TNF-α and VEGF levels were
measured in EBC and serum of 15 patients with stage I-IIA NSCLC and in 30 healthy controls by immunoassay. The tumor size was measured by a CT scan. The
concentrations of IL-6, IL-17 and VEGF were significantly higher in EBC of patients with lung cancer, compared with controls, while only serum IL-6 concentration was
higher in patients compared to controls. A significant correlation (r = 0.78, p = 0.001) was observed between EBC levels of IL-6 and IL-17; IL-17 was also correlated to EBC
levels of the VEGF (r = 0.83, p < 0.001) and TNF-α (r = 0.62, p = 0.014). The tumor diameter was significantly correlated with EBC concentrations of VEGF (r = 0.58, p =
0.039), IL-6 (r = 0.67, p = 0.013) and IL-17 (r = 0.66, p = 0.017). Our results show a significant relationship between inflammatory and angiogenic markers, measured in
EBC by a non-invasive method, and tumor mass. To assess whether polymorphisms of the interleukin-23 receptor (IL23R) gene are associated with bladder transitional cell
carcinoma because chronic inflammation contributes to bladder cancer and the IL23R is known to be critically involved in the carcinogenesis of various malignant tumors.
226 patients with bladder cancer and 270 age-matched controls were involved in the study. Polymerase chain reaction-restriction fragment length polymorphism was used
for genotyping. Genotype distribution and allelic frequencies between patients and controls were compared. In all three single nucleotide polymorphisms of IL23R studied,
the distribution of genotype and allele frequencies of rs10889677 differed significantly between patients and controls. The frequency of allele C of rs10889677 was
significantly increased in cases compared with controls (0.2898 vs. 0.1833, odds ratio 1.818, 95 % confidence interval 1.349-2.449). The result indicates that IL23R may
play an important role in the susceptibility of bladder cancer in Chinese population. For over a century, inactivated or attenuated bacteria have been employed in the clinic
as immunotherapies to treat cancer, starting with the Coley's vaccines in the 19th century and leading to the currently approved bacillus Calmette-Guérin vaccine for
bladder cancer. While effective, the inflammation induced by these therapies is transient and not designed to induce long-lasting tumor-specific cytolytic T lymphocyte
(CTL) responses that have proven so adept at eradicating tumors. Therefore, in order to maintain the benefits of bacteria-induced acute inflammation but gain long-lasting
anti-tumor immunity, many groups have constructed recombinant bacteria expressing tumor-associated antigens (TAAs) for the purpose of activating tumor-specific CTLs.
One bacterium has proven particularly adept at inducing powerful anti-tumor immunity, Listeria monocytogenes (Lm). Lm is a gram-positive bacterium that selectively
infects antigen-presenting cells wherein it is able to efficiently deliver tumor antigens to both the MHC Class I and II antigen presentation pathways for activation of tumor-
targeting CTL-mediated immunity. Lm is a versatile bacterial vector as evidenced by its ability to induce therapeutic immunity against a wide-array of TAAs and specifically
infect and kill tumor cells directly. It is for these reasons, among others, that Lm-based immunotherapies have delivered impressive therapeutic efficacy in preclinical
models of cancer for two decades and are now showing promise clinically. The result indicates that IL23R may play an important role in the susceptibility of bladder cancer
in Chinese population. For over a century, inactivated or attenuated bacteria have been employed in the clinic as immunotherapies to treat cancer, starting with the Coley's
vaccines in the 19th century and leading to the currently approved bacillus Calmette-Guérin vaccine for bladder cancer. While effective, the inflammation induced by these
therapies is transient and not designed to induce long-lasting tumor-specific cytolytic T lymphocyte (CTL) responses that have proven so adept at eradicating tumors.
Therefore, in order to maintain the benefits of bacteria-induced acute inflammation but gain long-lasting anti-tumor immunity, many groups have constructed recombinant
bacteria expressing tumor-associated antigens (TAAs) for the purpose of activating tumor-specific CTLs. One bacterium has proven particularly adept at inducing powerful
anti-tumor immunity, Listeria monocytogenes (Lm). Lm is a gram-positive bacterium that selectively infects antigen-presenting cells wherein it is able to efficiently deliver
tumor antigens to both the MHC Class I and II antigen presentation pathways for activation of tumor-targeting CTL-mediated immunity. Lm is a versatile bacterial vector as
evidenced by its ability to induce therapeutic immunity against a wide-array of TAAs and specifically infect and kill tumor cells directly. It is for these reasons, among others,
that Lm-based immunotherapies have delivered impressive therapeutic efficacy in preclinical models of cancer for two decades and are now showing promise clinically.
inflammation contributes to bladder cancer and the IL23R is known to be critically involved in the carcinogenesis of various malignant tumors. 226 patients with bladder
cancer and 270 age-matched controls were involved in the study. Polymerase chain reaction-restriction fragment length polymorphism was used for genotyping. Genotype
distribution and allelic frequencies between patients and controls were compared. In all three single nucleotide polymorphisms of IL23R studied, the distribution of genotype
and allele frequencies of rs10889677 differed significantly between patients and controls. The frequency of allele C of rs10889677 was significantly increased in cases
compared with controls (0.2898 vs. 0.1833, odds ratio 1.818, 95 % confidence
Ibuprofen
Topiramate
Headache
Acetaminophen
TREATS
PREVENTS
ISA
LOCATION_OF
Migraine
Disorders
Migraine
Disorders
Migraine
Disorders
Migraine
Disorders
TREATS
Migraine
Disorders
Migraine
Disorders
Vestibule
Pain
ISA
24

Step 1: get all documents for each concept in semantic summary.
Step 2: create bag-of-words for each concept (term-frequency).
Step 3: then aggregate the bag-of-words for each concept in the entire
semantic summary.
Step 4: we use the idfs for each words in the corpus to create the tf-idf vector for the
given semantic summary.
Summary Transformation
𝑡𝑓𝑖𝑑𝑓 𝑡, 𝑑, 𝐷 = 𝑡𝑓 𝑡, 𝑑 ∗ log
𝑁
𝑛 𝑡
26

Bag-of-words Model
We used hemofiltration to treat a patient with digoxin overdose that was
complicated by refractory hyperkalemia.
bow = [(we,1), (used,1), . . ., (hyperkalemia,1)]
bow_sparse_vector =[(678,1), (2,1), . . ., (999,1)]
27

Dictionary Creation
28
Term Index Document id
ibuprofen 0 1,3,…,3000
.
.
.
migraine 5 5,6,…,475
Documents
ibuprofen is …. migraine
Ibuprofen is effective in treating Migraine

Gold Standard Vectorization
Step 1: iterate over the each document in the gold standard.
Step 2: tokenize each sentence.
Step 3: create the bag-of-words model.
Step 4: we use the idfs for each word from the dictionary to create the tf-idf
vector for the gold standard.
Problem: data sparsity.
30

Gold Standard Vectorization Enhancement
Step 1: MetaMap the gold standard document.
Step 2: create bag-of-words for each concept (term frequency).
Step 3: then aggregate the bag-of-words for each concept bag-of-words for
summary.
Step 4: we use the idfs for each word from the dictionary to create the tf-idf
vector for the gold standard.
Solution: enhance with context clues from corpus.
31

Step 1: select 20 disease as topics for an information need.
Step 2: use each query to generate a semantic summary.
Step 3: transform each semantic summary into semantic summary vectors.
Step 4: transform each gold standard into a gold standard tf-idf vectors.
Step 5: compute the similarity between a semantic summary vector and its associated
gold standard vector under different features.
Step 6: determine the features that generate the most informative summary in each
scenario.
Evaluation: Overall Approach
32

• Cosine Similarity
• Euclidean distance
→
• Jensen-Shannon Distance
Summarization Evaluation Metrics
𝑠, 𝑇 =
𝑠 ⋅ 𝑇
𝑠 𝑇
cosine
ⅇ 𝑠, 𝑇 =
𝑖=1
𝑛
𝜔𝑖 − 𝑡𝑖
2
𝐽𝑆𝐷( 𝑠| 𝑇 =
1
2
[𝐾𝐿( 𝑠| 𝑀 + 𝐾𝐿 𝑇 𝑀 ,
K𝐿 𝑠||𝑇 =
i=1
𝑛
p w 𝑖
log
P w 𝑖
P 𝑡 𝑖
where 𝑀 =
1
2
(𝑠′ + 𝑇)
33

32
Cosine Similarity
3 00 00 02 0 0 03 22
5 42 53 61 3 1 20 00
– Gold standard vector
– semantic summary vector
𝑇
𝑠
𝑇
𝑠
𝑠, 𝑇 =
𝑠 ⋅ 𝑇
𝑠 𝑇
cosine
w1 w2 w6 w7 w8 w9 w10 w11 w12 w|W|w3 w4 w5
W – {w1, w2, . . . , wn}

3333
Euclidean Distance
3 00 00 02 0 0 03 22
5 42 53 61 3 1 20 00
w1 w2 w6 w7 w8 w9 w10 w11 w12 w|W|w3 w4 w5
𝑇
𝑠
ⅇ 𝑠, 𝑇 =
𝑖=1
𝑛
𝜔𝑖 − 𝑡𝑖
2
– Gold standard vector
– semantic summary vector
𝑇
𝑠
W – {w1, w2, . . . , wn}
(3 − 5)2+ (2 − 1)2+(3 − 0)2+(2 − 0)2+(0 − 3)2+(0 − 2)2+(0 − 5)2+. . . +(0 − 2)2
= 122
= 11.04

Semantic Similarity Comparison
34

Root Mean-Squared Error
35
𝐸 = (𝑒1,, 𝑒2,. . . , 𝑒20 )
𝐸𝑆 = (𝑆1
′
, 𝑆2
′
, . . ., 𝑆20
′
)
𝐸𝑆 = (𝑇1
′
, 𝑇2
′
, . . ., 𝑇20
′
)
cos 𝐸𝑆, 𝐸𝑇 = (𝑐𝑜𝑠1, 𝑐𝑜𝑠2, . . . , 𝑐𝑜𝑠20)
ⅇu𝑑 𝐸𝑆, 𝐸𝑇 = (ⅇu𝑑1, ⅇu𝑑2, . . . , ⅇu𝑑20)
JS 𝐸𝑆, 𝐸𝑇 = (𝑗𝑠1, 𝑗𝑠2, . . . , 𝑗𝑠20)

Root Mean-Squared Error
36
𝑆𝐼𝑀 = {𝑠𝑖𝑚1, 𝑠𝑖𝑚2, . . . , 𝑠𝑖𝑚20}
𝑅𝑀𝑆𝐸 𝑆𝐼𝑀 = 𝑖=1
𝑛
𝑠𝑖𝑚𝑖
2
𝑛

Method Cosine-RMSE Euclidean-RMSE JS-RMSE
Leave-out-relevancy 0.263 0.315 0.187
Leave-out-connectivity 0.263 0.335 0.143
Leave-out-novelty 0.254 0.329 0.252
Leave-out-saliency 0.237 0.333 0.281
Evaluation
Saliency is the most important feature.
37

• We propose a method for intrinsic evaluation of abstractive summarization.
• We transform semantic summaries in an equivalent textual representation.
• We evaluate the impact of these features using numerous similarity metrics.
• We adopt a leave-one-out strategy to identify and evaluate the features that impact
automatically generated semantic summaries.
Contributions
38

Limitations and Future Work
1. Query diversity
- 20 disease treatments
2. Concept-based bag-of-words
3. Gold standard impurities
- Diluted quality based on co-occurrence
39
Use machine learning and a larger query set
Involve more domain experts and consider
other gold standard creation techniques
Use facts instead of concepts

40
THANK YOU!
Prof. Amit P. Sheth
(Advisor)
Prof. Krishnaprasad
Thirunarayan
Thomas C. Rindflesch Delroy Cameron
Acknowledgements

ResQu: A Framework for Automatic Evaluation of Knowledge-Driven Automatic Summarization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to ResQu: A Framework for Automatic Evaluation of Knowledge-Driven Automatic Summarization

Similar to ResQu: A Framework for Automatic Evaluation of Knowledge-Driven Automatic Summarization (20)

Recently uploaded

Recently uploaded (20)

ResQu: A Framework for Automatic Evaluation of Knowledge-Driven Automatic Summarization

Editor's Notes