SlideShare a Scribd company logo
1 of 66
Download to read offline
Karam Abdulahhad
GESIS - Cologne
karam.abdulahhad@gesis.org
karam.abdulahhad@gmail.com
Beyond Classical Information Retrieval (IR)
Conceptual IR
Linguistic phenomena & IR problems
20-12-2018GESIS - K.Abdulahhad2
How have “fiddles” changed over time
Violins
Like most technological breakthroughs, today's
violin is an evolutionary product. So far as we
know, there were no violins in 1500. A century
later, there were several types and probably
thousands of specimens north and south of the
Alps, and from England to Poland. A marvel of
craftsmanship and acoustical engineering, the
violin produced more sound than any stringed
instrument to date. Almost immediately,
composers, players and collectors liked what
they heard and saw. Italian and non-Italian
makers proliferated.
……….
Linguistic phenomena & IR problems
20-12-2018GESIS - K.Abdulahhad3
Historical information about “sugar
river bank”
History and Mission Statement
…………
The Bank continues to grow at a healthy pace.
We have continued to do well and be a leader
in our industry. Our main branch was expanded
in 1982 and we now have branches in Sunapee,
New London, Warner, Grantham and Concord.
We at Sugar River Bank are proud of our
history and growth. It is the responsibility of
each and every member of our Bank's family to
insure continued growth in the future.
…………
www.sugarriverbank.com
Linguistic phenomena & IR problems
20-12-2018GESIS - K.Abdulahhad4
Historical information about “sugar
river bank”
The Life-Sustaining Sugar River
…………
The west branch of the Sugar River historically
supported a native trout population, but had
suffered from sedimentation, overgrazing of its
banks and warming water. “Restoration efforts
in the Dane County portion of the watershed
reduced nonpoint source pollution, installed
riverbank vegetative filter strips, improved in-
stream habitat, restricted cattle access to
streams, and improved management of animal
waste from barnyards,” says Hansis.
…………
northwestquarterly.com
Linguistic phenomena & IR problems
20-12-2018GESIS - K.Abdulahhad5
Part-Whole
Hand Body
Heteronyms
Bank(com) Bank(geo)
Hyponym / Hypernym
B-cell Lymphocyte
Synonyms
Violin Fiddle
Co-hyponym
Cat Dog
Observations
1. Inadequacy of the term-independence assumption,
which leads to the term-mismatch problem
20-12-2018GESIS - K.Abdulahhad6
Observations
1. Inadequacy of the term-independence assumption,
which leads to the term-mismatch problem
2. Retrieval process has an inferential nature, where the
classical word-based document-query comparison
paradigm is insufficient
20-12-2018GESIS - K.Abdulahhad7
20-12-2018GESIS - K.Abdulahhad8
Conceptual approach
Conceptual approach
20-12-2018GESIS - K.Abdulahhad9
 Concepts are categories encompassing all synonymous
terms
Conceptual approach
20-12-2018GESIS - K.Abdulahhad10
 Concepts are categories encompassing all synonymous
terms
Atrial fibrillation
Auricular fibrillation
C0004238
Ticker
Watch
S04563183
Cancer
Malignant neoplastic disease
S14263400
WordNet
Snake
Serpent
Ophidian
S01729333
UMLS
Skin cancer
Melanoma
Malignant neoplasm of skin
C0004238
Conceptual approach
20-12-2018GESIS - K.Abdulahhad11
 Concepts are categories encompassing all synonymous
terms
Using concepts IDs
instead of terms
Atrial fibrillation
Auricular fibrillation
C0004238
Ticker
Watch
S04563183
Cancer
Malignant neoplastic disease
S14263400
WordNet
Snake
Serpent
Ophidian
S01729333
UMLS
Skin cancer
Melanoma
Malignant neoplasm of skin
C0004238
20-12-2018GESIS - K.Abdulahhad12
Part I: Relative Concept Frequency
[1] K. . Abdulahhad et al., Revisiting the Term Frequency in concept-Based IR Models. DEXA 2013
[2] K. . Abdulahhad et al., MRIM at ImageCLEF2012. From Words to Concepts: A New Counting Approach.
CLEF 2012
Relative Concept Frequency (problem)
20-12-2018GESIS - K.Abdulahhad13
 Text to concepts mapping
Relative Concept Frequency (problem)
20-12-2018GESIS - K.Abdulahhad14
 Text to concepts mapping
 Using MetaMap & UMLS concepts
Relative Concept Frequency (problem)
20-12-2018GESIS - K.Abdulahhad15
 Text to concepts mapping
 Using MetaMap & UMLS concepts
Precision
Recall
Relative Concept Frequency (problem)
GESIS - K.Abdulahhad16
Word-space Concept-space
𝑑 =‘lobar pneumonia x-ray’
𝑑 = 3 𝑑 =? ?
 Document length
20-12-2018
Relative Concept Frequency (idea)
 Use all concepts but maintaining word-based document
length
 Structure based redistribution of word-based document
length on concepts
GESIS - K.Abdulahhad17 20-12-2018
Relative Concept Frequency (how)
 Computing relative frequency
GESIS - K.Abdulahhad18 20-12-2018
Relative Concept Frequency (how)
 Computing relative frequency
 Hypothesis 1: concepts of
larger phrase receive larger
count (more specific meaning)
GESIS - K.Abdulahhad19 20-12-2018
Relative Concept Frequency (how)
 Computing relative frequency
 Hypothesis 1: concepts of
larger phrase receive larger
count (more specific meaning)
 Hypothesis 2: the bigger the
set of concepts is for a phrase,
the less important count its
concepts receive (ambiguity)
GESIS - K.Abdulahhad20 20-12-2018
Relative Concept Frequency (how)
 Computing relative frequency
 Hypothesis 1: concepts of
larger phrase receive larger
count (more specific meaning)
 Hypothesis 2: the bigger the
set of concepts is for a phrase,
the less important count its
concepts receive (ambiguity)
 Hypothesis 3: maintaining
word-based 𝑑
GESIS - K.Abdulahhad21 20-12-2018
Computing Relative Concept Frequency
(Step 1)
 Step 1: map text to concepts (via e.g. MetaMap)
GESIS - K.Abdulahhad22
Sub-phrases Concepts
𝑇1:‘lobar pneumonia’ 𝐶1 = 𝐶0032300, 𝐶0155862
𝑇2:‘pneumonia x-ray’ 𝐶2 = 𝐶0581647
𝑇3:‘lobar’ 𝐶3 =
𝐶1511010, 𝐶1428707,
𝐶0796494
𝑇4:‘pneumonia’ 𝐶4 =
𝐶0024109, 𝐶1278908,
𝐶0032285, 𝐶2707265,
𝐶2709248
𝑇5:‘x-ray’ 𝐶4 =
𝐶0034571, 𝐶0043299,
𝐶0043309, 𝐶1306645,
𝐶1714805, 𝐶1962945
‘lobar pneumonia x-ray’
MetaMap
20-12-2018
Computing Relative Concept Frequency
(Step 2)
 Step 2: build hierarchy
GESIS - K.Abdulahhad23
𝑇𝑖, 𝐶𝑖 < 𝑇𝑗, 𝐶𝑗 ⇔ 𝑇𝑖 ⊂ 𝑇𝑗
 11,CT
R
 22 ,CT
 33,CT  44 ,CT  55,CT
Virtual node
Sub-phrases Concepts
𝑇1:‘lobar pneumonia’ 𝐶1 = 𝐶0032300, 𝐶0155862
𝑇2:‘pneumonia x-ray’ 𝐶2 = 𝐶0581647
𝑇3:‘lobar’ 𝐶3 =
𝐶1511010, 𝐶1428707,
𝐶0796494
𝑇4:‘pneumonia’ 𝐶4 =
𝐶0024109, 𝐶1278908,
𝐶0032285, 𝐶2707265,
𝐶2709248
𝑇5:‘x-ray’ 𝐶4 =
𝐶0034571, 𝐶0043299,
𝐶0043309, 𝐶1306645,
𝐶1714805, 𝐶1962945
20-12-2018
Computing Relative Concept Frequency
(Step 3)
 Step 3: compute relative frequency 𝑟𝑓𝑖
 Breadth first search
 The relative frequency 𝑟𝑓𝑖 of 𝑐 ∈ 𝐶𝑖 must be
propositional to 𝑇𝑖 (Hypothesis 1), and inversely
propositional to 𝐶𝑖 (Hypothesis 2)
 Maintaining 𝑑 by distributing it on the concepts of 𝑑
(Hypothesis 3).
GESIS - K.Abdulahhad24
Sub-phrases Concept 𝒓𝒇𝒊
𝑇1:‘lobar pneumonia’
𝑇1 = 2, 𝐶1 = 2
𝐶0032300
𝐶0155862
𝑇2:‘pneumonia x-ray’
𝑇2 = 2, 𝐶2 = 1
𝐶0581647
𝑇3:‘lobar’
𝑇3 = 1, 𝐶3 =3
𝐶1511010
𝐶1428707
𝐶0796494
𝑇4:‘pneumonia’
𝑇4 = 1, 𝐶4 =5
𝐶0024109
𝐶1278908
𝐶0032285
𝐶2707265
𝐶2709248
𝑇5:‘x-ray’
𝑇5 = 1, 𝐶5 =6
𝐶0034571
𝐶0043299
𝐶0043309
𝐶1306645
𝐶1714805
𝐶1962945
 11,CT
R
 22 ,CT
 33,CT  44 ,CT  55 ,CT
20-12-2018
Computing Relative Concept Frequency
(Step 3)
 We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia
x-ray’ on its concepts
GESIS - K.Abdulahhad25
 11,CT
R
 22,CT
 33,CT  44,CT  55,CT
3
20-12-2018
Computing Relative Concept Frequency
(Step 3)
 Step 3: computing relative weight
 For each node 𝑇𝑖, 𝐶𝑖 we compute three values
 𝛼𝑖 the amount that should be distributed on the concepts of the current node 𝑇𝑖, 𝐶𝑖 and
its children
 𝛼𝑖 = 𝛿 𝑝𝑎𝑟𝑒𝑛𝑡 × 𝑇𝑖𝑝𝑎𝑟𝑒𝑛𝑡𝑠
 𝛿𝑖 the portion of one single word of the input amount 𝛼𝑖
 𝛿𝑖 =
𝛼 𝑖
𝑇 𝑖 + 𝑇 𝑐ℎ𝑖𝑙𝑑𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛
 𝛽𝑖, or equivalently 𝑟𝑓𝑖, the relative frequency of each concept 𝑐 ∈ 𝐶𝑖
 𝛽𝑖 =
𝛿 𝑖× 𝑇𝑖
𝐶 𝑖
GESIS - K.Abdulahhad26 20-12-2018
Computing Relative Concept Frequency
(Step 3)
 We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia
x-ray’ on its concepts
GESIS - K.Abdulahhad27
 11,CT
R
 22,CT
 33,CT  44,CT  55,CT
𝛼 𝑅 = 3
𝛼 𝑅 3
𝛿 𝑅
3
4
𝛽 𝑅
𝛼1
𝛿1
𝛽1
𝛼2
𝛿2
𝛽2
𝛼3
𝛿3
𝛽3
𝛼4
𝛿4
𝛽4
𝛼5
𝛿5
𝛽5
𝛿 𝑅 =
𝛼 𝑅
𝑇𝑅 + 𝑇1 + 𝑇2
=
3
4
𝑇𝑅 = 0
𝐶 𝑅 = 0
𝑇2 = 2
𝐶2 = 1
𝑇1 = 2
𝐶1 = 2
𝑇4 = 1
𝐶4 = 5
𝑇5 = 1
𝐶5 = 6
𝑇3 = 1
𝐶3 = 3
𝛼 𝑅 = 3
20-12-2018
Computing Relative Concept Frequency
(Step 3)
 We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia
x-ray’ on its concepts
GESIS - K.Abdulahhad28
 11,CT
R
 22,CT
 33,CT  44,CT  55,CT
𝛼 𝑅
3
𝛿 𝑅
3
4
𝛽 𝑅
𝛼1
3
2
𝛿1
3
8
𝛽1
3
8
𝛼2
𝛿2
𝛽2
𝛼3
𝛿3
𝛽3
𝛼4
𝛿4
𝛽4
𝛼5
𝛿5
𝛽5
𝑇𝑅 = 0
𝐶 𝑅 = 0
𝑇2 = 2
𝐶2 = 1
𝑇1 = 2
𝐶1 = 2
𝑇4 = 1
𝐶4 = 5
𝑇5 = 1
𝐶5 = 6
𝑇3 = 1
𝐶3 = 3
𝛼1 = 𝛿 𝑅 × 𝑇1 =
3
2
𝛿1 =
𝛼1
𝑇1 + 𝑇3 + 𝑇4
=
3
8
𝛽1 =
𝛿1 × 𝑇1
𝐶1
=
3
8
20-12-2018
Computing Relative Concept Frequency
(Step 3)
 We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia
x-ray’ on its concepts
GESIS - K.Abdulahhad29
 11,CT
R
 22,CT
 33,CT  44,CT  55,CT
𝛼 𝑅
3
𝛿 𝑅
3
4
𝛽 𝑅
𝛼1
3
2
𝛿1
3
8
𝛽1
3
8
𝛼2
3
2
𝛿2
3
8
𝛽2
3
4
𝛼3
𝛿3
𝛽3
𝛼4
𝛿4
𝛽4
𝛼5
𝛿5
𝛽5
𝑇𝑅 = 0
𝐶 𝑅 = 0
𝑇2 = 2
𝐶2 = 1
𝑇1 = 2
𝐶1 = 2
𝑇4 = 1
𝐶4 = 5
𝑇5 = 1
𝐶5 = 6
𝑇3 = 1
𝐶3 = 3
𝛼2 = 𝛿 𝑅 × 𝑇2 =
3
2
𝛿2 =
𝛼2
𝑇2 + 𝑇4 + 𝑇5
=
3
8
𝛽2 =
𝛿2 × 𝑇2
𝐶2
=
3
4
20-12-2018
Computing Relative Concept Frequency
(Step 3)
 We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia
x-ray’ on its concepts
GESIS - K.Abdulahhad30
 11,CT
R
 22,CT
 33,CT  44,CT  55,CT
𝛼 𝑅
3
𝛿 𝑅
3
4
𝛽 𝑅
𝛼1
3
2
𝛿1
3
8
𝛽1
3
8
𝛼2
3
2
𝛿2
3
8
𝛽2
3
4
𝛼3
3
8
𝛿3
3
8
𝛽3
1
8
𝛼4
𝛿4
𝛽4
𝛼5
𝛿5
𝛽5
𝑇𝑅 = 0
𝐶 𝑅 = 0
𝑇2 = 2
𝐶2 = 1
𝑇1 = 2
𝐶1 = 2
𝑇4 = 1
𝐶4 = 5
𝑇5 = 1
𝐶5 = 6
𝑇3 = 1
𝐶3 = 3
𝛼3 = 𝛿1 × 𝑇3 =
3
8
𝛿3 =
𝛼3
𝑇3
=
3
8
𝛽3 =
𝛿3 × 𝑇3
𝐶3
=
1
8
20-12-2018
Computing Relative Concept Frequency
(Step 3)
 We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia
x-ray’ on its concepts
GESIS - K.Abdulahhad31
 11,CT
R
 22,CT
 33,CT  44,CT  55,CT
𝛼 𝑅
3
𝛿 𝑅
3
4
𝛽 𝑅
𝛼1
3
2
𝛿1
3
8
𝛽1
3
8
𝛼2
3
2
𝛿2
3
8
𝛽2
3
4
𝛼3
3
8
𝛿3
3
8
𝛽3
1
8
𝛼4
3
4
𝛿4
3
4
𝛽4
3
20
𝛼5
𝛿5
𝛽5
𝑇𝑅 = 0
𝐶 𝑅 = 0
𝑇2 = 2
𝐶2 = 1
𝑇1 = 2
𝐶1 = 2
𝑇4 = 1
𝐶4 = 5
𝑇5 = 1
𝐶5 = 6
𝑇3 = 1
𝐶3 = 3
𝛼4 = 𝛿1 × 𝑇4 + 𝛿2 × 𝑇4 =
3
4
𝛿4 =
𝛼4
𝑇4
=
3
4
𝛽4 =
𝛿4 × 𝑇4
𝐶4
=
3
20
20-12-2018
Computing Relative Concept Frequency
(Step 3)
 We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia
x-ray’ on its concepts
GESIS - K.Abdulahhad32
 11,CT
R
 22,CT
 33,CT  44,CT  55,CT
𝛼 𝑅
3
𝛿 𝑅
3
4
𝛽 𝑅
𝛼1
3
2
𝛿1
3
8
𝛽1
3
8
𝛼2
3
2
𝛿2
3
8
𝛽2
3
4
𝛼3
3
8
𝛿3
3
8
𝛽3
1
8
𝛼4
3
4
𝛿4
3
4
𝛽4
3
20
𝛼5
3
8
𝛿5
3
8
𝛽5
1
16
𝑇𝑅 = 0
𝐶 𝑅 = 0
𝑇2 = 2
𝐶2 = 1
𝑇1 = 2
𝐶1 = 2
𝑇4 = 1
𝐶4 = 5
𝑇5 = 1
𝐶5 = 6
𝑇3 = 1
𝐶3 = 3
𝛼5 = 𝛿2 × 𝑇5 =
3
8
𝛿5 =
𝛼5
𝑇5
=
3
8
𝛽5 =
𝛿5 × 𝑇5
𝐶5
=
1
16
20-12-2018
Computing Relative Concept Frequency
(Step 3)
 We distribute the 𝑑 = 3 of phrase
‘lobar pneumonia x-ray’ on its concepts
GESIS - K.Abdulahhad33
 11,CT
R
 22,CT
 33,CT  44,CT  55,CT
𝛼 𝑅
3
𝛿 𝑅
3
4
𝛽 𝑅
𝛼1
3
2
𝛿1
3
8
𝛽1
3
8
𝛼2
3
2
𝛿2
3
8
𝛽2
3
4
𝛼3
3
8
𝛿3
3
8
𝛽3
1
8
𝛼4
3
4
𝛿4
3
4
𝛽4
3
20
𝛼5
3
8
𝛿5
3
8
𝛽5
1
16
𝑇𝑅 = 0
𝐶 𝑅 = 0
𝑇2 = 2
𝐶2 = 1
𝑇1 = 2
𝐶1 = 2
𝑇4 = 1
𝐶4 = 5
𝑇5 = 1
𝐶5 = 6
𝑇3 = 1
𝐶3 = 3
20-12-2018
Sub-phrases Concept 𝒓𝒇𝒊
𝑇1:‘lobar pneumonia’
𝑇1 = 2, 𝐶1 = 2
𝐶0032300 3
8𝐶0155862
𝑇2:‘pneumonia x-ray’
𝑇2 = 2, 𝐶2 = 1
𝐶0581647
3
4
𝑇3:‘lobar’
𝑇3 = 1, 𝐶3 =3
𝐶1511010
1
8
𝐶1428707
𝐶0796494
𝑇4:‘pneumonia’
𝑇4 = 1, 𝐶4 =5
𝐶0024109
3
20
𝐶1278908
𝐶0032285
𝐶2707265
𝐶2709248
𝑇5:‘x-ray’
𝑇5 = 1, 𝐶5 =6
𝐶0034571
1
16
𝐶0043299
𝐶0043309
𝐶1306645
𝐶1714805
𝐶1962945
Sub-phrases Concept 𝒓𝒇𝒊
𝑇1:‘lobar pneumonia’
𝑇1 = 2, 𝐶1 = 2
𝐶0032300 3
8𝐶0155862
𝑇2:‘pneumonia x-ray’
𝑇2 = 2, 𝐶2 = 1
𝐶0581647
3
4
𝑇3:‘lobar’
𝑇3 = 1, 𝐶3 =3
𝐶1511010
1
8
𝐶1428707
𝐶0796494
𝑇4:‘pneumonia’
𝑇4 = 1, 𝐶4 =5
𝐶0024109
3
20
𝐶1278908
𝐶0032285
𝐶2707265
𝐶2709248
𝑇5:‘x-ray’
𝑇5 = 1, 𝐶5 =6
𝐶0034571
1
16
𝐶0043299
𝐶0043309
𝐶1306645
𝐶1714805
𝐶1962945
Computing Relative Concept Frequency
(Step 3)
GESIS - K.Abdulahhad34 20-12-2018
Sub-phrases Concept 𝒓𝒇𝒊
𝑇1:‘lobar pneumonia’
𝑇1 = 2, 𝐶1 = 2
𝐶0032300 3
8𝐶0155862
𝑇2:‘pneumonia x-ray’
𝑇2 = 2, 𝐶2 = 1
𝐶0581647
3
4
𝑇3:‘lobar’
𝑇3 = 1, 𝐶3 =3
𝐶1511010
1
8
𝐶1428707
𝐶0796494
𝑇4:‘pneumonia’
𝑇4 = 1, 𝐶4 =5
𝐶0024109
3
20
𝐶1278908
𝐶0032285
𝐶2707265
𝐶2709248
𝑇5:‘x-ray’
𝑇5 = 1, 𝐶5 =6
𝐶0034571
1
16
𝐶0043299
𝐶0043309
𝐶1306645
𝐶1714805
𝐶1962945
Computing Relative Concept Frequency
(Step 3)
 From this table, we can see that the
concepts of less ambiguous and
longest phrase have the highest
frequency
GESIS - K.Abdulahhad35 20-12-2018
Sub-phrases Concept 𝒓𝒇𝒊
𝑇1:‘lobar pneumonia’
𝑇1 = 2, 𝐶1 = 2
𝐶0032300 3
8𝐶0155862
𝑇2:‘pneumonia x-ray’
𝑇2 = 2, 𝐶2 = 1
𝐶0581647
3
4
𝑇3:‘lobar’
𝑇3 = 1, 𝐶3 =3
𝐶1511010
1
8
𝐶1428707
𝐶0796494
𝑇4:‘pneumonia’
𝑇4 = 1, 𝐶4 =5
𝐶0024109
3
20
𝐶1278908
𝐶0032285
𝐶2707265
𝐶2709248
𝑇5:‘x-ray’
𝑇5 = 1, 𝐶5 =6
𝐶0034571
1
16
𝐶0043299
𝐶0043309
𝐶1306645
𝐶1714805
𝐶1962945
Computing Relative Concept Frequency
(Step 3)
 From this table, we can see that the
concepts of less ambiguous and
longest phrase have the highest
frequency
 Concepts of most ambiguous and
shortest phrase have the lowest
frequency
GESIS - K.Abdulahhad36 20-12-2018
Sub-phrases Concept 𝒓𝒇𝒊
𝑇1:‘lobar pneumonia’
𝑇1 = 2, 𝐶1 = 2
𝐶0032300 3
8𝐶0155862
𝑇2:‘pneumonia x-ray’
𝑇2 = 2, 𝐶2 = 1
𝐶0581647
3
4
𝑇3:‘lobar’
𝑇3 = 1, 𝐶3 =3
𝐶1511010
1
8
𝐶1428707
𝐶0796494
𝑇4:‘pneumonia’
𝑇4 = 1, 𝐶4 =5
𝐶0024109
3
20
𝐶1278908
𝐶0032285
𝐶2707265
𝐶2709248
𝑇5:‘x-ray’
𝑇5 = 1, 𝐶5 =6
𝐶0034571
1
16
𝐶0043299
𝐶0043309
𝐶1306645
𝐶1714805
𝐶1962945
Computing Relative Concept Frequency
(Step 3)
 From this table, we can see that the
concepts of less ambiguous and
longest phrase have the highest
frequency
 Concepts of most ambiguous and
shortest phrase have the lowest
frequency
GESIS - K.Abdulahhad37 20-12-2018
𝑟𝑓𝑖 = 3
Relative Concept Frequency (results)
 Corpora
GESIS - K.Abdulahhad38 20-12-2018
104.26
Relative Concept Frequency (results)
GESIS - K.Abdulahhad39 20-12-2018
(*) indicates to statistically significant (𝛼 < 0.05) improvement w.r.t. classical concepts frequency
TF
Relative Concept Frequency (results)
GESIS - K.Abdulahhad40 20-12-2018
Relative Concept Frequency (conclusion)
 Dealing with the document length deformation
 Encouraging results
 Increase recall
 Maintain or even increase the precision
 Can be used with classical IR models
 Change the (TF) component
GESIS - K.Abdulahhad41 20-12-2018
20-12-2018GESIS - K.Abdulahhad42
Part II: Concept Embedding
[3] K. Abdulahhad, Concept embedding for information retrieval. ECIR 2018
Concept embedding (problem)
20-12-2018GESIS - K.Abdulahhad43
fiddle violinS04544161
C0004238 skin cancermelanoma
Concept embedding (problem)
20-12-2018GESIS - K.Abdulahhad44
fiddle violinS04544161
C0004238 skin cancermelanoma
B-celllymphocyte C0004561 C0024264
Concept embedding (problem)
20-12-2018GESIS - K.Abdulahhad45
fiddle violinS04544161
C0004238 skin cancermelanoma
B-celllymphocyte C0004561 C0024264
is-a
Concept embedding (problem)
20-12-2018GESIS - K.Abdulahhad46
fiddle violinS04544161
C0004238 skin cancermelanoma
B-celllymphocyte C0004561 C0024264
is-a
Relation-based concept similarity is problematic
Concept embedding (problem)
20-12-2018GESIS - K.Abdulahhad47
fiddle violinS04544161
C0004238 skin cancermelanoma
B-celllymphocyte C0004561 C0024264
is-a
Relation-based concept similarity is problematic
fiddle violin
B-cell lymphocyte
handbody
is-a
part-of
Relations have different
semantics & properties
synonymous
Concept embedding (idea)
20-12-2018GESIS - K.Abdulahhad48
 Concepts as vectors
 Still using concepts to reduce mismatch effect
 Avoiding the complexities of relation-based inter-
concept similarity
Concept embedding (idea)
20-12-2018GESIS - K.Abdulahhad49
 Concepts as vectors
 Still using concepts to reduce mismatch effect
 Avoiding the complexities of relation-based inter-
concept similarity
Check adaptability of concept-embedding-based
similarity to IR
Goal
Concept embedding (approaches)
20-12-2018GESIS - K.Abdulahhad50
 Flat embedding
⋯
𝑐 = 𝐹 𝑤1, ⋯ , 𝑤 𝑛 𝑐
𝑤1 𝑤 𝑛
Concept embedding (approaches)
20-12-2018GESIS - K.Abdulahhad51
 Hierarchical embedding
⋮
⋮⋯
𝑐
⋯
⋯
⋮
⋮
⋮
⋮
𝑤1 𝑤 𝑛
𝑠1 𝑠 𝑚
𝑡1 𝑡 𝑘
𝑠𝑖 = 𝐹 𝑤1
𝑖
, ⋯ , 𝑤 𝑛
𝑖
𝑡𝑗 = 𝐹 𝑠1
𝑗
, ⋯ , 𝑠 𝑚
𝑗
𝑐 = 𝐹 𝑡1, ⋯ , 𝑡 𝑘
Concept embedding (approaches)
20-12-2018GESIS - K.Abdulahhad52
 Weighted embedding
𝑐 = 𝐹 𝛼1 𝑤1, ⋯ , 𝛼 𝑛 𝑤 𝑛 𝑐
𝑤1 𝑤 𝑛⋯
Concept embedding (experiments)
20-12-2018GESIS - K.Abdulahhad53
 Experiments consist of two parts
 Generating concept embedding vectors
 Testing a vector-based concept similarity for ad-hoc IR
Concept embedding (experiments)
1. Generating concept embedding vectors
20-12-2018GESIS - K.Abdulahhad54
 Word embedding
 PubMed Central collection (1177879 vocabularies)
 Word2Vec
 Vector size 500
 Continuous bag of words
 Window size 8
 Negative sampling 25
Concept embedding (experiments)
1. Generating concept embedding vectors
20-12-2018GESIS - K.Abdulahhad55
 Concept embedding
 UMLS2017 concepts (only English content)
 For each concept, we build the corresponding set of words
 Flat embedding
 Replace F by avg
 Hierarchical embedding
 Replace F by avg
 Weighted embedding
 Replace F by weighted-avg
 The weight 𝛼 𝑤 of a word w is: 𝛼 𝑤 = ln
𝑁+1
𝑛
 N the number of documents in PubMed Central
 n is the document frequency of w in PubMed Central
Concept embedding (experiments)
1. Generating concept embedding vectors
20-12-2018GESIS - K.Abdulahhad56
 Concept embedding (missing words)
 Fixed random vectors
 Several experiments for weighting missing words
 The word is too popular n = N (poor idf)
 The word is too rare n = 1 (high idf)
 Or in between n = N/2
Concept embedding (experiments)
2. Testing a vector-based concept similarity for ad-hoc IR
20-12-2018GESIS - K.Abdulahhad57
 Corpora
 clef11 & clef12
 Text to concepts mapping
 MetaMap
 UMLS concepts
Concept embedding (experiments)
2. Testing a vector-based concept similarity for ad-hoc IR
20-12-2018GESIS - K.Abdulahhad58
 IR model and concept similarity
𝑅𝑆𝑉 𝑑, 𝑞 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑞 𝑐 × 𝑠𝑖𝑚 𝑐, 𝑐∗ × 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑 𝑐∗
𝑐∈𝑞
Concept embedding (experiments)
2. Testing a vector-based concept similarity for ad-hoc IR
20-12-2018GESIS - K.Abdulahhad59
 IR model and concept similarity
 Weight(c): BM25 and Pivoted Normalization
𝑅𝑆𝑉 𝑑, 𝑞 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑞 𝑐 × 𝑠𝑖𝑚 𝑐, 𝑐∗ × 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑 𝑐∗
𝑐∈𝑞
Concept embedding (experiments)
2. Testing a vector-based concept similarity for ad-hoc IR
20-12-2018GESIS - K.Abdulahhad60
 IR model and concept similarity
 Weight(c): BM25 and Pivoted Normalization
 Concept similarity
𝑅𝑆𝑉 𝑑, 𝑞 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑞 𝑐 × 𝑠𝑖𝑚 𝑐, 𝑐∗ × 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑 𝑐∗
𝑐∈𝑞
𝑠𝑖𝑚 𝑐𝑖, 𝑐𝑗 =
0 cos 𝜃 ≤ 0
𝛽 × cos 𝜃 2 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Concept embedding (experiments)
2. Testing a vector-based concept similarity for ad-hoc IR
20-12-2018GESIS - K.Abdulahhad61
 IR model and concept similarity
 Weight(c): BM25 and Pivoted Normalization
 Concept similarity
 For comparison (Leacock)
𝑅𝑆𝑉 𝑑, 𝑞 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑞 𝑐 × 𝑠𝑖𝑚 𝑐, 𝑐∗ × 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑 𝑐∗
𝑐∈𝑞
𝑠𝑖𝑚 𝑐𝑖, 𝑐𝑗 =
0 cos 𝜃 ≤ 0
𝛽 × cos 𝜃 2 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Concept embedding (experiments)
2. Testing a vector-based concept similarity for ad-hoc IR
20-12-2018GESIS - K.Abdulahhad62
 Results
(*) indicates to statistically significant (𝛼 < 0.05) improvement w.r.t.“NoEmb-NoSim”
(†) indicates to statistically significant (𝛼 < 0.05) improvement w.r.t.“NoEmb-Leacock”
Concept embedding (conclusion)
20-12-2018GESIS - K.Abdulahhad63
 Three approaches to build concept vectors
based on word embedding
 Promising results to use vector-based concept
representation and similarity
 Concepts and words are represented in the
same vector space
 they are comparable
 Improve approaches like MetaMap
20-12-2018GESIS - K.Abdulahhad64
Conclusion
Conclusion
 Dealing with the two observations
 Inadequacy of the term independence assumption
 Retrieval process has an inferential nature
 Conceptual IR
 Document length deformation
 Inter-concept relations quantification
20-12-2018GESIS - K.Abdulahhad65
20-12-2018GESIS - K.Abdulahhad66
Thank you …

More Related Content

Similar to Beyond Classical Information Retrieval (IR): Conceptual IR

sigmod-keynote.pdf
sigmod-keynote.pdfsigmod-keynote.pdf
sigmod-keynote.pdfssuser56e850
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Roy Clariana
 
Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Austin Benson
 
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive AnalyticsV.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive AnalyticsElinor Velasquez
 
Tim Brown ACEAS Phenocams
Tim Brown ACEAS PhenocamsTim Brown ACEAS Phenocams
Tim Brown ACEAS Phenocamsaceas13tern
 
Deep red - The environmental impact of deep learning (Paolo Caressa)
Deep red - The environmental impact of deep learning (Paolo Caressa)Deep red - The environmental impact of deep learning (Paolo Caressa)
Deep red - The environmental impact of deep learning (Paolo Caressa)MeetupDataScienceRoma
 
Dgpg college kanpur_2015
Dgpg college kanpur_2015Dgpg college kanpur_2015
Dgpg college kanpur_2015Puneet Kacker
 
From health persona to societal health uci 131202
From health persona to societal health  uci  131202From health persona to societal health  uci  131202
From health persona to societal health uci 131202Ramesh Jain
 
Doing Scientific Investigations W2D3.pptx
Doing Scientific Investigations W2D3.pptxDoing Scientific Investigations W2D3.pptx
Doing Scientific Investigations W2D3.pptxEloisaCayago1
 
NLP support for clinical tasks and decisions
NLP support for clinical tasks and decisionsNLP support for clinical tasks and decisions
NLP support for clinical tasks and decisionsCORIA-TALN 2018
 
Deep Learning for Food Analysis
Deep Learning for Food Analysis Deep Learning for Food Analysis
Deep Learning for Food Analysis Petia Radeva
 
"The data revolution", par Serena Capital
"The data revolution", par Serena Capital"The data revolution", par Serena Capital
"The data revolution", par Serena CapitalL'Usine Digitale
 
The Data Revolution - Serena Capital
The Data Revolution - Serena CapitalThe Data Revolution - Serena Capital
The Data Revolution - Serena CapitalJean-Baptiste Dumont
 
Thailand Policy Foresight in Covid-19 Era
Thailand Policy Foresight in Covid-19 EraThailand Policy Foresight in Covid-19 Era
Thailand Policy Foresight in Covid-19 EraKan Yuenyong
 
Big Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedBig Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedPhilip Bourne
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
CHAPTER 14 CLUSTERING.PPTX
CHAPTER 14 CLUSTERING.PPTXCHAPTER 14 CLUSTERING.PPTX
CHAPTER 14 CLUSTERING.PPTXVasudhaSrivatsa1
 
15/3 -17 impact exponential technologies
15/3 -17 impact exponential technologies 15/3 -17 impact exponential technologies
15/3 -17 impact exponential technologies Paul Epping
 

Similar to Beyond Classical Information Retrieval (IR): Conceptual IR (20)

sigmod-keynote.pdf
sigmod-keynote.pdfsigmod-keynote.pdf
sigmod-keynote.pdf
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18
 
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive AnalyticsV.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
V.8.0-Emerging Frontiers and Future Directions for Predictive Analytics
 
Tim Brown ACEAS Phenocams
Tim Brown ACEAS PhenocamsTim Brown ACEAS Phenocams
Tim Brown ACEAS Phenocams
 
Deep red - The environmental impact of deep learning (Paolo Caressa)
Deep red - The environmental impact of deep learning (Paolo Caressa)Deep red - The environmental impact of deep learning (Paolo Caressa)
Deep red - The environmental impact of deep learning (Paolo Caressa)
 
Dgpg college kanpur_2015
Dgpg college kanpur_2015Dgpg college kanpur_2015
Dgpg college kanpur_2015
 
From health persona to societal health uci 131202
From health persona to societal health  uci  131202From health persona to societal health  uci  131202
From health persona to societal health uci 131202
 
Doing Scientific Investigations W2D3.pptx
Doing Scientific Investigations W2D3.pptxDoing Scientific Investigations W2D3.pptx
Doing Scientific Investigations W2D3.pptx
 
NLP support for clinical tasks and decisions
NLP support for clinical tasks and decisionsNLP support for clinical tasks and decisions
NLP support for clinical tasks and decisions
 
Deep Learning for Food Analysis
Deep Learning for Food Analysis Deep Learning for Food Analysis
Deep Learning for Food Analysis
 
"The data revolution", par Serena Capital
"The data revolution", par Serena Capital"The data revolution", par Serena Capital
"The data revolution", par Serena Capital
 
The Data Revolution - Serena Capital
The Data Revolution - Serena CapitalThe Data Revolution - Serena Capital
The Data Revolution - Serena Capital
 
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...
 
Thailand Policy Foresight in Covid-19 Era
Thailand Policy Foresight in Covid-19 EraThailand Policy Foresight in Covid-19 Era
Thailand Policy Foresight in Covid-19 Era
 
Big Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH HeadedBig Data in Biomedicine: Where is the NIH Headed
Big Data in Biomedicine: Where is the NIH Headed
 
Bigdata AI
Bigdata AI Bigdata AI
Bigdata AI
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
CHAPTER 14 CLUSTERING.PPTX
CHAPTER 14 CLUSTERING.PPTXCHAPTER 14 CLUSTERING.PPTX
CHAPTER 14 CLUSTERING.PPTX
 
15/3 -17 impact exponential technologies
15/3 -17 impact exponential technologies 15/3 -17 impact exponential technologies
15/3 -17 impact exponential technologies
 

Recently uploaded

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Beyond Classical Information Retrieval (IR): Conceptual IR

  • 1. Karam Abdulahhad GESIS - Cologne karam.abdulahhad@gesis.org karam.abdulahhad@gmail.com Beyond Classical Information Retrieval (IR) Conceptual IR
  • 2. Linguistic phenomena & IR problems 20-12-2018GESIS - K.Abdulahhad2 How have “fiddles” changed over time Violins Like most technological breakthroughs, today's violin is an evolutionary product. So far as we know, there were no violins in 1500. A century later, there were several types and probably thousands of specimens north and south of the Alps, and from England to Poland. A marvel of craftsmanship and acoustical engineering, the violin produced more sound than any stringed instrument to date. Almost immediately, composers, players and collectors liked what they heard and saw. Italian and non-Italian makers proliferated. ……….
  • 3. Linguistic phenomena & IR problems 20-12-2018GESIS - K.Abdulahhad3 Historical information about “sugar river bank” History and Mission Statement ………… The Bank continues to grow at a healthy pace. We have continued to do well and be a leader in our industry. Our main branch was expanded in 1982 and we now have branches in Sunapee, New London, Warner, Grantham and Concord. We at Sugar River Bank are proud of our history and growth. It is the responsibility of each and every member of our Bank's family to insure continued growth in the future. ………… www.sugarriverbank.com
  • 4. Linguistic phenomena & IR problems 20-12-2018GESIS - K.Abdulahhad4 Historical information about “sugar river bank” The Life-Sustaining Sugar River ………… The west branch of the Sugar River historically supported a native trout population, but had suffered from sedimentation, overgrazing of its banks and warming water. “Restoration efforts in the Dane County portion of the watershed reduced nonpoint source pollution, installed riverbank vegetative filter strips, improved in- stream habitat, restricted cattle access to streams, and improved management of animal waste from barnyards,” says Hansis. ………… northwestquarterly.com
  • 5. Linguistic phenomena & IR problems 20-12-2018GESIS - K.Abdulahhad5 Part-Whole Hand Body Heteronyms Bank(com) Bank(geo) Hyponym / Hypernym B-cell Lymphocyte Synonyms Violin Fiddle Co-hyponym Cat Dog
  • 6. Observations 1. Inadequacy of the term-independence assumption, which leads to the term-mismatch problem 20-12-2018GESIS - K.Abdulahhad6
  • 7. Observations 1. Inadequacy of the term-independence assumption, which leads to the term-mismatch problem 2. Retrieval process has an inferential nature, where the classical word-based document-query comparison paradigm is insufficient 20-12-2018GESIS - K.Abdulahhad7
  • 9. Conceptual approach 20-12-2018GESIS - K.Abdulahhad9  Concepts are categories encompassing all synonymous terms
  • 10. Conceptual approach 20-12-2018GESIS - K.Abdulahhad10  Concepts are categories encompassing all synonymous terms Atrial fibrillation Auricular fibrillation C0004238 Ticker Watch S04563183 Cancer Malignant neoplastic disease S14263400 WordNet Snake Serpent Ophidian S01729333 UMLS Skin cancer Melanoma Malignant neoplasm of skin C0004238
  • 11. Conceptual approach 20-12-2018GESIS - K.Abdulahhad11  Concepts are categories encompassing all synonymous terms Using concepts IDs instead of terms Atrial fibrillation Auricular fibrillation C0004238 Ticker Watch S04563183 Cancer Malignant neoplastic disease S14263400 WordNet Snake Serpent Ophidian S01729333 UMLS Skin cancer Melanoma Malignant neoplasm of skin C0004238
  • 12. 20-12-2018GESIS - K.Abdulahhad12 Part I: Relative Concept Frequency [1] K. . Abdulahhad et al., Revisiting the Term Frequency in concept-Based IR Models. DEXA 2013 [2] K. . Abdulahhad et al., MRIM at ImageCLEF2012. From Words to Concepts: A New Counting Approach. CLEF 2012
  • 13. Relative Concept Frequency (problem) 20-12-2018GESIS - K.Abdulahhad13  Text to concepts mapping
  • 14. Relative Concept Frequency (problem) 20-12-2018GESIS - K.Abdulahhad14  Text to concepts mapping  Using MetaMap & UMLS concepts
  • 15. Relative Concept Frequency (problem) 20-12-2018GESIS - K.Abdulahhad15  Text to concepts mapping  Using MetaMap & UMLS concepts Precision Recall
  • 16. Relative Concept Frequency (problem) GESIS - K.Abdulahhad16 Word-space Concept-space 𝑑 =‘lobar pneumonia x-ray’ 𝑑 = 3 𝑑 =? ?  Document length 20-12-2018
  • 17. Relative Concept Frequency (idea)  Use all concepts but maintaining word-based document length  Structure based redistribution of word-based document length on concepts GESIS - K.Abdulahhad17 20-12-2018
  • 18. Relative Concept Frequency (how)  Computing relative frequency GESIS - K.Abdulahhad18 20-12-2018
  • 19. Relative Concept Frequency (how)  Computing relative frequency  Hypothesis 1: concepts of larger phrase receive larger count (more specific meaning) GESIS - K.Abdulahhad19 20-12-2018
  • 20. Relative Concept Frequency (how)  Computing relative frequency  Hypothesis 1: concepts of larger phrase receive larger count (more specific meaning)  Hypothesis 2: the bigger the set of concepts is for a phrase, the less important count its concepts receive (ambiguity) GESIS - K.Abdulahhad20 20-12-2018
  • 21. Relative Concept Frequency (how)  Computing relative frequency  Hypothesis 1: concepts of larger phrase receive larger count (more specific meaning)  Hypothesis 2: the bigger the set of concepts is for a phrase, the less important count its concepts receive (ambiguity)  Hypothesis 3: maintaining word-based 𝑑 GESIS - K.Abdulahhad21 20-12-2018
  • 22. Computing Relative Concept Frequency (Step 1)  Step 1: map text to concepts (via e.g. MetaMap) GESIS - K.Abdulahhad22 Sub-phrases Concepts 𝑇1:‘lobar pneumonia’ 𝐶1 = 𝐶0032300, 𝐶0155862 𝑇2:‘pneumonia x-ray’ 𝐶2 = 𝐶0581647 𝑇3:‘lobar’ 𝐶3 = 𝐶1511010, 𝐶1428707, 𝐶0796494 𝑇4:‘pneumonia’ 𝐶4 = 𝐶0024109, 𝐶1278908, 𝐶0032285, 𝐶2707265, 𝐶2709248 𝑇5:‘x-ray’ 𝐶4 = 𝐶0034571, 𝐶0043299, 𝐶0043309, 𝐶1306645, 𝐶1714805, 𝐶1962945 ‘lobar pneumonia x-ray’ MetaMap 20-12-2018
  • 23. Computing Relative Concept Frequency (Step 2)  Step 2: build hierarchy GESIS - K.Abdulahhad23 𝑇𝑖, 𝐶𝑖 < 𝑇𝑗, 𝐶𝑗 ⇔ 𝑇𝑖 ⊂ 𝑇𝑗  11,CT R  22 ,CT  33,CT  44 ,CT  55,CT Virtual node Sub-phrases Concepts 𝑇1:‘lobar pneumonia’ 𝐶1 = 𝐶0032300, 𝐶0155862 𝑇2:‘pneumonia x-ray’ 𝐶2 = 𝐶0581647 𝑇3:‘lobar’ 𝐶3 = 𝐶1511010, 𝐶1428707, 𝐶0796494 𝑇4:‘pneumonia’ 𝐶4 = 𝐶0024109, 𝐶1278908, 𝐶0032285, 𝐶2707265, 𝐶2709248 𝑇5:‘x-ray’ 𝐶4 = 𝐶0034571, 𝐶0043299, 𝐶0043309, 𝐶1306645, 𝐶1714805, 𝐶1962945 20-12-2018
  • 24. Computing Relative Concept Frequency (Step 3)  Step 3: compute relative frequency 𝑟𝑓𝑖  Breadth first search  The relative frequency 𝑟𝑓𝑖 of 𝑐 ∈ 𝐶𝑖 must be propositional to 𝑇𝑖 (Hypothesis 1), and inversely propositional to 𝐶𝑖 (Hypothesis 2)  Maintaining 𝑑 by distributing it on the concepts of 𝑑 (Hypothesis 3). GESIS - K.Abdulahhad24 Sub-phrases Concept 𝒓𝒇𝒊 𝑇1:‘lobar pneumonia’ 𝑇1 = 2, 𝐶1 = 2 𝐶0032300 𝐶0155862 𝑇2:‘pneumonia x-ray’ 𝑇2 = 2, 𝐶2 = 1 𝐶0581647 𝑇3:‘lobar’ 𝑇3 = 1, 𝐶3 =3 𝐶1511010 𝐶1428707 𝐶0796494 𝑇4:‘pneumonia’ 𝑇4 = 1, 𝐶4 =5 𝐶0024109 𝐶1278908 𝐶0032285 𝐶2707265 𝐶2709248 𝑇5:‘x-ray’ 𝑇5 = 1, 𝐶5 =6 𝐶0034571 𝐶0043299 𝐶0043309 𝐶1306645 𝐶1714805 𝐶1962945  11,CT R  22 ,CT  33,CT  44 ,CT  55 ,CT 20-12-2018
  • 25. Computing Relative Concept Frequency (Step 3)  We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia x-ray’ on its concepts GESIS - K.Abdulahhad25  11,CT R  22,CT  33,CT  44,CT  55,CT 3 20-12-2018
  • 26. Computing Relative Concept Frequency (Step 3)  Step 3: computing relative weight  For each node 𝑇𝑖, 𝐶𝑖 we compute three values  𝛼𝑖 the amount that should be distributed on the concepts of the current node 𝑇𝑖, 𝐶𝑖 and its children  𝛼𝑖 = 𝛿 𝑝𝑎𝑟𝑒𝑛𝑡 × 𝑇𝑖𝑝𝑎𝑟𝑒𝑛𝑡𝑠  𝛿𝑖 the portion of one single word of the input amount 𝛼𝑖  𝛿𝑖 = 𝛼 𝑖 𝑇 𝑖 + 𝑇 𝑐ℎ𝑖𝑙𝑑𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛  𝛽𝑖, or equivalently 𝑟𝑓𝑖, the relative frequency of each concept 𝑐 ∈ 𝐶𝑖  𝛽𝑖 = 𝛿 𝑖× 𝑇𝑖 𝐶 𝑖 GESIS - K.Abdulahhad26 20-12-2018
  • 27. Computing Relative Concept Frequency (Step 3)  We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia x-ray’ on its concepts GESIS - K.Abdulahhad27  11,CT R  22,CT  33,CT  44,CT  55,CT 𝛼 𝑅 = 3 𝛼 𝑅 3 𝛿 𝑅 3 4 𝛽 𝑅 𝛼1 𝛿1 𝛽1 𝛼2 𝛿2 𝛽2 𝛼3 𝛿3 𝛽3 𝛼4 𝛿4 𝛽4 𝛼5 𝛿5 𝛽5 𝛿 𝑅 = 𝛼 𝑅 𝑇𝑅 + 𝑇1 + 𝑇2 = 3 4 𝑇𝑅 = 0 𝐶 𝑅 = 0 𝑇2 = 2 𝐶2 = 1 𝑇1 = 2 𝐶1 = 2 𝑇4 = 1 𝐶4 = 5 𝑇5 = 1 𝐶5 = 6 𝑇3 = 1 𝐶3 = 3 𝛼 𝑅 = 3 20-12-2018
  • 28. Computing Relative Concept Frequency (Step 3)  We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia x-ray’ on its concepts GESIS - K.Abdulahhad28  11,CT R  22,CT  33,CT  44,CT  55,CT 𝛼 𝑅 3 𝛿 𝑅 3 4 𝛽 𝑅 𝛼1 3 2 𝛿1 3 8 𝛽1 3 8 𝛼2 𝛿2 𝛽2 𝛼3 𝛿3 𝛽3 𝛼4 𝛿4 𝛽4 𝛼5 𝛿5 𝛽5 𝑇𝑅 = 0 𝐶 𝑅 = 0 𝑇2 = 2 𝐶2 = 1 𝑇1 = 2 𝐶1 = 2 𝑇4 = 1 𝐶4 = 5 𝑇5 = 1 𝐶5 = 6 𝑇3 = 1 𝐶3 = 3 𝛼1 = 𝛿 𝑅 × 𝑇1 = 3 2 𝛿1 = 𝛼1 𝑇1 + 𝑇3 + 𝑇4 = 3 8 𝛽1 = 𝛿1 × 𝑇1 𝐶1 = 3 8 20-12-2018
  • 29. Computing Relative Concept Frequency (Step 3)  We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia x-ray’ on its concepts GESIS - K.Abdulahhad29  11,CT R  22,CT  33,CT  44,CT  55,CT 𝛼 𝑅 3 𝛿 𝑅 3 4 𝛽 𝑅 𝛼1 3 2 𝛿1 3 8 𝛽1 3 8 𝛼2 3 2 𝛿2 3 8 𝛽2 3 4 𝛼3 𝛿3 𝛽3 𝛼4 𝛿4 𝛽4 𝛼5 𝛿5 𝛽5 𝑇𝑅 = 0 𝐶 𝑅 = 0 𝑇2 = 2 𝐶2 = 1 𝑇1 = 2 𝐶1 = 2 𝑇4 = 1 𝐶4 = 5 𝑇5 = 1 𝐶5 = 6 𝑇3 = 1 𝐶3 = 3 𝛼2 = 𝛿 𝑅 × 𝑇2 = 3 2 𝛿2 = 𝛼2 𝑇2 + 𝑇4 + 𝑇5 = 3 8 𝛽2 = 𝛿2 × 𝑇2 𝐶2 = 3 4 20-12-2018
  • 30. Computing Relative Concept Frequency (Step 3)  We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia x-ray’ on its concepts GESIS - K.Abdulahhad30  11,CT R  22,CT  33,CT  44,CT  55,CT 𝛼 𝑅 3 𝛿 𝑅 3 4 𝛽 𝑅 𝛼1 3 2 𝛿1 3 8 𝛽1 3 8 𝛼2 3 2 𝛿2 3 8 𝛽2 3 4 𝛼3 3 8 𝛿3 3 8 𝛽3 1 8 𝛼4 𝛿4 𝛽4 𝛼5 𝛿5 𝛽5 𝑇𝑅 = 0 𝐶 𝑅 = 0 𝑇2 = 2 𝐶2 = 1 𝑇1 = 2 𝐶1 = 2 𝑇4 = 1 𝐶4 = 5 𝑇5 = 1 𝐶5 = 6 𝑇3 = 1 𝐶3 = 3 𝛼3 = 𝛿1 × 𝑇3 = 3 8 𝛿3 = 𝛼3 𝑇3 = 3 8 𝛽3 = 𝛿3 × 𝑇3 𝐶3 = 1 8 20-12-2018
  • 31. Computing Relative Concept Frequency (Step 3)  We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia x-ray’ on its concepts GESIS - K.Abdulahhad31  11,CT R  22,CT  33,CT  44,CT  55,CT 𝛼 𝑅 3 𝛿 𝑅 3 4 𝛽 𝑅 𝛼1 3 2 𝛿1 3 8 𝛽1 3 8 𝛼2 3 2 𝛿2 3 8 𝛽2 3 4 𝛼3 3 8 𝛿3 3 8 𝛽3 1 8 𝛼4 3 4 𝛿4 3 4 𝛽4 3 20 𝛼5 𝛿5 𝛽5 𝑇𝑅 = 0 𝐶 𝑅 = 0 𝑇2 = 2 𝐶2 = 1 𝑇1 = 2 𝐶1 = 2 𝑇4 = 1 𝐶4 = 5 𝑇5 = 1 𝐶5 = 6 𝑇3 = 1 𝐶3 = 3 𝛼4 = 𝛿1 × 𝑇4 + 𝛿2 × 𝑇4 = 3 4 𝛿4 = 𝛼4 𝑇4 = 3 4 𝛽4 = 𝛿4 × 𝑇4 𝐶4 = 3 20 20-12-2018
  • 32. Computing Relative Concept Frequency (Step 3)  We distribute the 𝑑 = 3 of the phrase ‘lobar pneumonia x-ray’ on its concepts GESIS - K.Abdulahhad32  11,CT R  22,CT  33,CT  44,CT  55,CT 𝛼 𝑅 3 𝛿 𝑅 3 4 𝛽 𝑅 𝛼1 3 2 𝛿1 3 8 𝛽1 3 8 𝛼2 3 2 𝛿2 3 8 𝛽2 3 4 𝛼3 3 8 𝛿3 3 8 𝛽3 1 8 𝛼4 3 4 𝛿4 3 4 𝛽4 3 20 𝛼5 3 8 𝛿5 3 8 𝛽5 1 16 𝑇𝑅 = 0 𝐶 𝑅 = 0 𝑇2 = 2 𝐶2 = 1 𝑇1 = 2 𝐶1 = 2 𝑇4 = 1 𝐶4 = 5 𝑇5 = 1 𝐶5 = 6 𝑇3 = 1 𝐶3 = 3 𝛼5 = 𝛿2 × 𝑇5 = 3 8 𝛿5 = 𝛼5 𝑇5 = 3 8 𝛽5 = 𝛿5 × 𝑇5 𝐶5 = 1 16 20-12-2018
  • 33. Computing Relative Concept Frequency (Step 3)  We distribute the 𝑑 = 3 of phrase ‘lobar pneumonia x-ray’ on its concepts GESIS - K.Abdulahhad33  11,CT R  22,CT  33,CT  44,CT  55,CT 𝛼 𝑅 3 𝛿 𝑅 3 4 𝛽 𝑅 𝛼1 3 2 𝛿1 3 8 𝛽1 3 8 𝛼2 3 2 𝛿2 3 8 𝛽2 3 4 𝛼3 3 8 𝛿3 3 8 𝛽3 1 8 𝛼4 3 4 𝛿4 3 4 𝛽4 3 20 𝛼5 3 8 𝛿5 3 8 𝛽5 1 16 𝑇𝑅 = 0 𝐶 𝑅 = 0 𝑇2 = 2 𝐶2 = 1 𝑇1 = 2 𝐶1 = 2 𝑇4 = 1 𝐶4 = 5 𝑇5 = 1 𝐶5 = 6 𝑇3 = 1 𝐶3 = 3 20-12-2018 Sub-phrases Concept 𝒓𝒇𝒊 𝑇1:‘lobar pneumonia’ 𝑇1 = 2, 𝐶1 = 2 𝐶0032300 3 8𝐶0155862 𝑇2:‘pneumonia x-ray’ 𝑇2 = 2, 𝐶2 = 1 𝐶0581647 3 4 𝑇3:‘lobar’ 𝑇3 = 1, 𝐶3 =3 𝐶1511010 1 8 𝐶1428707 𝐶0796494 𝑇4:‘pneumonia’ 𝑇4 = 1, 𝐶4 =5 𝐶0024109 3 20 𝐶1278908 𝐶0032285 𝐶2707265 𝐶2709248 𝑇5:‘x-ray’ 𝑇5 = 1, 𝐶5 =6 𝐶0034571 1 16 𝐶0043299 𝐶0043309 𝐶1306645 𝐶1714805 𝐶1962945
  • 34. Sub-phrases Concept 𝒓𝒇𝒊 𝑇1:‘lobar pneumonia’ 𝑇1 = 2, 𝐶1 = 2 𝐶0032300 3 8𝐶0155862 𝑇2:‘pneumonia x-ray’ 𝑇2 = 2, 𝐶2 = 1 𝐶0581647 3 4 𝑇3:‘lobar’ 𝑇3 = 1, 𝐶3 =3 𝐶1511010 1 8 𝐶1428707 𝐶0796494 𝑇4:‘pneumonia’ 𝑇4 = 1, 𝐶4 =5 𝐶0024109 3 20 𝐶1278908 𝐶0032285 𝐶2707265 𝐶2709248 𝑇5:‘x-ray’ 𝑇5 = 1, 𝐶5 =6 𝐶0034571 1 16 𝐶0043299 𝐶0043309 𝐶1306645 𝐶1714805 𝐶1962945 Computing Relative Concept Frequency (Step 3) GESIS - K.Abdulahhad34 20-12-2018
  • 35. Sub-phrases Concept 𝒓𝒇𝒊 𝑇1:‘lobar pneumonia’ 𝑇1 = 2, 𝐶1 = 2 𝐶0032300 3 8𝐶0155862 𝑇2:‘pneumonia x-ray’ 𝑇2 = 2, 𝐶2 = 1 𝐶0581647 3 4 𝑇3:‘lobar’ 𝑇3 = 1, 𝐶3 =3 𝐶1511010 1 8 𝐶1428707 𝐶0796494 𝑇4:‘pneumonia’ 𝑇4 = 1, 𝐶4 =5 𝐶0024109 3 20 𝐶1278908 𝐶0032285 𝐶2707265 𝐶2709248 𝑇5:‘x-ray’ 𝑇5 = 1, 𝐶5 =6 𝐶0034571 1 16 𝐶0043299 𝐶0043309 𝐶1306645 𝐶1714805 𝐶1962945 Computing Relative Concept Frequency (Step 3)  From this table, we can see that the concepts of less ambiguous and longest phrase have the highest frequency GESIS - K.Abdulahhad35 20-12-2018
  • 36. Sub-phrases Concept 𝒓𝒇𝒊 𝑇1:‘lobar pneumonia’ 𝑇1 = 2, 𝐶1 = 2 𝐶0032300 3 8𝐶0155862 𝑇2:‘pneumonia x-ray’ 𝑇2 = 2, 𝐶2 = 1 𝐶0581647 3 4 𝑇3:‘lobar’ 𝑇3 = 1, 𝐶3 =3 𝐶1511010 1 8 𝐶1428707 𝐶0796494 𝑇4:‘pneumonia’ 𝑇4 = 1, 𝐶4 =5 𝐶0024109 3 20 𝐶1278908 𝐶0032285 𝐶2707265 𝐶2709248 𝑇5:‘x-ray’ 𝑇5 = 1, 𝐶5 =6 𝐶0034571 1 16 𝐶0043299 𝐶0043309 𝐶1306645 𝐶1714805 𝐶1962945 Computing Relative Concept Frequency (Step 3)  From this table, we can see that the concepts of less ambiguous and longest phrase have the highest frequency  Concepts of most ambiguous and shortest phrase have the lowest frequency GESIS - K.Abdulahhad36 20-12-2018
  • 37. Sub-phrases Concept 𝒓𝒇𝒊 𝑇1:‘lobar pneumonia’ 𝑇1 = 2, 𝐶1 = 2 𝐶0032300 3 8𝐶0155862 𝑇2:‘pneumonia x-ray’ 𝑇2 = 2, 𝐶2 = 1 𝐶0581647 3 4 𝑇3:‘lobar’ 𝑇3 = 1, 𝐶3 =3 𝐶1511010 1 8 𝐶1428707 𝐶0796494 𝑇4:‘pneumonia’ 𝑇4 = 1, 𝐶4 =5 𝐶0024109 3 20 𝐶1278908 𝐶0032285 𝐶2707265 𝐶2709248 𝑇5:‘x-ray’ 𝑇5 = 1, 𝐶5 =6 𝐶0034571 1 16 𝐶0043299 𝐶0043309 𝐶1306645 𝐶1714805 𝐶1962945 Computing Relative Concept Frequency (Step 3)  From this table, we can see that the concepts of less ambiguous and longest phrase have the highest frequency  Concepts of most ambiguous and shortest phrase have the lowest frequency GESIS - K.Abdulahhad37 20-12-2018 𝑟𝑓𝑖 = 3
  • 38. Relative Concept Frequency (results)  Corpora GESIS - K.Abdulahhad38 20-12-2018 104.26
  • 39. Relative Concept Frequency (results) GESIS - K.Abdulahhad39 20-12-2018 (*) indicates to statistically significant (𝛼 < 0.05) improvement w.r.t. classical concepts frequency TF
  • 40. Relative Concept Frequency (results) GESIS - K.Abdulahhad40 20-12-2018
  • 41. Relative Concept Frequency (conclusion)  Dealing with the document length deformation  Encouraging results  Increase recall  Maintain or even increase the precision  Can be used with classical IR models  Change the (TF) component GESIS - K.Abdulahhad41 20-12-2018
  • 42. 20-12-2018GESIS - K.Abdulahhad42 Part II: Concept Embedding [3] K. Abdulahhad, Concept embedding for information retrieval. ECIR 2018
  • 43. Concept embedding (problem) 20-12-2018GESIS - K.Abdulahhad43 fiddle violinS04544161 C0004238 skin cancermelanoma
  • 44. Concept embedding (problem) 20-12-2018GESIS - K.Abdulahhad44 fiddle violinS04544161 C0004238 skin cancermelanoma B-celllymphocyte C0004561 C0024264
  • 45. Concept embedding (problem) 20-12-2018GESIS - K.Abdulahhad45 fiddle violinS04544161 C0004238 skin cancermelanoma B-celllymphocyte C0004561 C0024264 is-a
  • 46. Concept embedding (problem) 20-12-2018GESIS - K.Abdulahhad46 fiddle violinS04544161 C0004238 skin cancermelanoma B-celllymphocyte C0004561 C0024264 is-a Relation-based concept similarity is problematic
  • 47. Concept embedding (problem) 20-12-2018GESIS - K.Abdulahhad47 fiddle violinS04544161 C0004238 skin cancermelanoma B-celllymphocyte C0004561 C0024264 is-a Relation-based concept similarity is problematic fiddle violin B-cell lymphocyte handbody is-a part-of Relations have different semantics & properties synonymous
  • 48. Concept embedding (idea) 20-12-2018GESIS - K.Abdulahhad48  Concepts as vectors  Still using concepts to reduce mismatch effect  Avoiding the complexities of relation-based inter- concept similarity
  • 49. Concept embedding (idea) 20-12-2018GESIS - K.Abdulahhad49  Concepts as vectors  Still using concepts to reduce mismatch effect  Avoiding the complexities of relation-based inter- concept similarity Check adaptability of concept-embedding-based similarity to IR Goal
  • 50. Concept embedding (approaches) 20-12-2018GESIS - K.Abdulahhad50  Flat embedding ⋯ 𝑐 = 𝐹 𝑤1, ⋯ , 𝑤 𝑛 𝑐 𝑤1 𝑤 𝑛
  • 51. Concept embedding (approaches) 20-12-2018GESIS - K.Abdulahhad51  Hierarchical embedding ⋮ ⋮⋯ 𝑐 ⋯ ⋯ ⋮ ⋮ ⋮ ⋮ 𝑤1 𝑤 𝑛 𝑠1 𝑠 𝑚 𝑡1 𝑡 𝑘 𝑠𝑖 = 𝐹 𝑤1 𝑖 , ⋯ , 𝑤 𝑛 𝑖 𝑡𝑗 = 𝐹 𝑠1 𝑗 , ⋯ , 𝑠 𝑚 𝑗 𝑐 = 𝐹 𝑡1, ⋯ , 𝑡 𝑘
  • 52. Concept embedding (approaches) 20-12-2018GESIS - K.Abdulahhad52  Weighted embedding 𝑐 = 𝐹 𝛼1 𝑤1, ⋯ , 𝛼 𝑛 𝑤 𝑛 𝑐 𝑤1 𝑤 𝑛⋯
  • 53. Concept embedding (experiments) 20-12-2018GESIS - K.Abdulahhad53  Experiments consist of two parts  Generating concept embedding vectors  Testing a vector-based concept similarity for ad-hoc IR
  • 54. Concept embedding (experiments) 1. Generating concept embedding vectors 20-12-2018GESIS - K.Abdulahhad54  Word embedding  PubMed Central collection (1177879 vocabularies)  Word2Vec  Vector size 500  Continuous bag of words  Window size 8  Negative sampling 25
  • 55. Concept embedding (experiments) 1. Generating concept embedding vectors 20-12-2018GESIS - K.Abdulahhad55  Concept embedding  UMLS2017 concepts (only English content)  For each concept, we build the corresponding set of words  Flat embedding  Replace F by avg  Hierarchical embedding  Replace F by avg  Weighted embedding  Replace F by weighted-avg  The weight 𝛼 𝑤 of a word w is: 𝛼 𝑤 = ln 𝑁+1 𝑛  N the number of documents in PubMed Central  n is the document frequency of w in PubMed Central
  • 56. Concept embedding (experiments) 1. Generating concept embedding vectors 20-12-2018GESIS - K.Abdulahhad56  Concept embedding (missing words)  Fixed random vectors  Several experiments for weighting missing words  The word is too popular n = N (poor idf)  The word is too rare n = 1 (high idf)  Or in between n = N/2
  • 57. Concept embedding (experiments) 2. Testing a vector-based concept similarity for ad-hoc IR 20-12-2018GESIS - K.Abdulahhad57  Corpora  clef11 & clef12  Text to concepts mapping  MetaMap  UMLS concepts
  • 58. Concept embedding (experiments) 2. Testing a vector-based concept similarity for ad-hoc IR 20-12-2018GESIS - K.Abdulahhad58  IR model and concept similarity 𝑅𝑆𝑉 𝑑, 𝑞 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑞 𝑐 × 𝑠𝑖𝑚 𝑐, 𝑐∗ × 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑 𝑐∗ 𝑐∈𝑞
  • 59. Concept embedding (experiments) 2. Testing a vector-based concept similarity for ad-hoc IR 20-12-2018GESIS - K.Abdulahhad59  IR model and concept similarity  Weight(c): BM25 and Pivoted Normalization 𝑅𝑆𝑉 𝑑, 𝑞 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑞 𝑐 × 𝑠𝑖𝑚 𝑐, 𝑐∗ × 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑 𝑐∗ 𝑐∈𝑞
  • 60. Concept embedding (experiments) 2. Testing a vector-based concept similarity for ad-hoc IR 20-12-2018GESIS - K.Abdulahhad60  IR model and concept similarity  Weight(c): BM25 and Pivoted Normalization  Concept similarity 𝑅𝑆𝑉 𝑑, 𝑞 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑞 𝑐 × 𝑠𝑖𝑚 𝑐, 𝑐∗ × 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑 𝑐∗ 𝑐∈𝑞 𝑠𝑖𝑚 𝑐𝑖, 𝑐𝑗 = 0 cos 𝜃 ≤ 0 𝛽 × cos 𝜃 2 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 61. Concept embedding (experiments) 2. Testing a vector-based concept similarity for ad-hoc IR 20-12-2018GESIS - K.Abdulahhad61  IR model and concept similarity  Weight(c): BM25 and Pivoted Normalization  Concept similarity  For comparison (Leacock) 𝑅𝑆𝑉 𝑑, 𝑞 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑞 𝑐 × 𝑠𝑖𝑚 𝑐, 𝑐∗ × 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑 𝑐∗ 𝑐∈𝑞 𝑠𝑖𝑚 𝑐𝑖, 𝑐𝑗 = 0 cos 𝜃 ≤ 0 𝛽 × cos 𝜃 2 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 62. Concept embedding (experiments) 2. Testing a vector-based concept similarity for ad-hoc IR 20-12-2018GESIS - K.Abdulahhad62  Results (*) indicates to statistically significant (𝛼 < 0.05) improvement w.r.t.“NoEmb-NoSim” (†) indicates to statistically significant (𝛼 < 0.05) improvement w.r.t.“NoEmb-Leacock”
  • 63. Concept embedding (conclusion) 20-12-2018GESIS - K.Abdulahhad63  Three approaches to build concept vectors based on word embedding  Promising results to use vector-based concept representation and similarity  Concepts and words are represented in the same vector space  they are comparable  Improve approaches like MetaMap
  • 65. Conclusion  Dealing with the two observations  Inadequacy of the term independence assumption  Retrieval process has an inferential nature  Conceptual IR  Document length deformation  Inter-concept relations quantification 20-12-2018GESIS - K.Abdulahhad65