SlideShare a Scribd company logo
1 of 74
Download to read offline
ENTITIES FOR AUGMENTED
INTELLIGENCE
Krisztian Balog
University of Stavanger

@krisztianbalog
Keynote given at the 23rd Interna+onal Conference on Theory and Prac+ce of Digital Libraries (TPDL '19) | Oslo, Norway, September 2019
ENTITIES ARE UBIQUITOUS
WHAT IS AN ENTITY?
An entity is a uniquely identifiable object or thing,
characterized by its name(s), type(s), attributes, and
relationships to other entities.
AN ENTITY
<dbr:Roger_Needham>
<dbo:Scientist>
<dbo:Person>
<dbo:Agent>
<owl:Thing>
<rdf:type>
<dbo:abstract>
"1935-08-26"
"Karen Spärck Jones"
<foaf:name>
<dbo:spouse>
<University_of_Cambridge>
<dbp:almaMater>
<dbr:Natural_language_processing>
<dbo:knownFor>
<dbc:Information_retrieval_researchers>
<dct:subject>
<dbc:British_women_computer_scientists>
<dbc:British_computer_scientists> <dbc:British_women_scientists>
"Karen Spärck Jones FBA (26 August
1935 – 4 April 2007) was a British
computer scientist."
<dbr:Karen_Spark_Jones>
<dbo:birthDate>
REPRESENTING ENTITIES 

AND THEIR PROPERTIES
entity catalog entity ID*
name(s)*
REPRESENTING ENTITIES 

AND THEIR PROPERTIES
entity catalog entity ID*
name(s)*
knowledge repository type(s)*
descriptions
relationships (non-typed links)
REPRESENTING ENTITIES 

AND THEIR PROPERTIES
entity catalog entity ID*
name(s)*
knowledge repository type(s)*
descriptions
relationships (non-typed links)
knowledge base (KB) /

knowledge graph (KG)
attributes
relationships (typed links)
REPRESENTING ENTITIES 

AND THEIR PROPERTIES
entity catalog entity ID*
name(s)*
knowledge repository type(s)*
descriptions
relationships (non-typed links)
knowledge base (KB) /

knowledge graph (KG)
attributes
relationships (typed links)
Meant for
human
consump+on
Meant for
machine
consump+on
WHY CARE ABOUT ENTITIES?
• From a user perspective,
entities ...
• are natural units for organizing
information
• enable a richer and more effective
user experience
• From a machine perspective,
entities ...
• allow for a better understanding of
queries, document content, and of
users
• help to bridge the gap between
unstructured and structured data
• enable search engines to be more
intelligent
TWO CORE COMPONENTS:
ENTITY RETRIEVAL & ENTITY LINKING
Part I
ENTITY RETRIEVAL
• Task: Answer an information need (expressed, e.g., as a free text
query) with a ranked list of entities from some catalog of entities
e1
e2
…
en
Information need
NUMEROUS APPLICATIONS
movie recommendation playlist completione-commerce search
APPROACHES
• Term-based entity representations can be effectively ranked
using document-based retrieval models
• Semantically informed retrieval models utilize entity-specific
properties (attributes, types, and relationships)
ENTITY LINKING
• Task: Recognize entity mentions in text and link them to the
corresponding entries in a knowledge repository
Michael Schumacher (born 3 January 1969) is a German retired racing driver. He
is a seven-time Formula One World Champion and is widely regarded as one of
the greatest Formula One drivers of all time. He won two titles with Benetton in
1994 and 1995 before moving to Ferrari where he drove for eleven years. His
time with Ferrari yielded five consecutive titles between 2000 and 2004.
Michael Schumacher
Schuderia Ferrari
Benetton Formula
Racing driver
Formula One constructor
Formula One constructor
Formula One
Auto racing series
APPROACH
Document
Men+on
detec+on
1
Candidate
selec+on
2
En+ty 

annota+ons
Disambigua+on3
<entity>
<entity>
SUMMARY OF PART I
• Established entity retrieval and entity linking techniques
provide a solid starting point
• Open issues
• Most work on entity retrieval has focused on keyword queries; there are
numerous other ways of expressing information needs
• Different types of input calls for different entity linking techniques
• Noisy short texts (e.g., tweets, queries), structured data (e.g., tables), OCR'ed text, ...
• Long tail entities (with sparse representation)
SEARCH IS OFTEN PART OF A
LARGER WORK TASK
EXAMPLE INFORMATION NEEDS
• Planning a road trip in California
• Creating a curriculum for a course (including recommended
literature and invited speakers)
• Finding out which anti-aircraft guns were used in ships during war
periods, what countries produced them, and if any working models
may be found (and where)
Answering complex information needs involves retrieving,
extracting, filtering, and aggregating information from
multiple sources
SMART ASSISTANCE FOR TABLES
Part II
TABLES ARE EVERYWHERE
Formula 1 constructors’ statistics 2016
Constructor
Ferrari
Engine Country Base
Force India
Haas
Ferrari
Mercedes
Ferrari
Italy
India
US
Italy
UK
US & UK
Manor Mercedes UK UK
…
…
Table cap+on
THE ANATOMY OF A RELATIONAL 

(ENTITY-FOCUSED) TABLE
Table en++es
(core/subject column)
Heading
column labels
(table schema)
Table data
WHAT KIND OF ASSISTANCE CAN WE
PROVIDE FOR PEOPLE WORKING
WITH (RELATIONAL) TABLES?
SMART ASSISTANCE
Remember me?
SMART ASSISTANCE
Sometimes I just pop up
for no particular reason
ASSISTANCE #1
Formula 1 constructors’ statistics 2016
1.McLaren
2.Mercedes
3.Red Bull
Add entity
Constructor
Ferrari
Engine Country Base
Force India
Haas
Ferrari
Mercedes
Ferrari
Italy
India
US
Italy
UK
US & UK
Manor Mercedes UK UK
Row popula+on
Suggesting entities to be
added to the subject
column of the table
ASSISTANCE #2
Formula 1 constructors’ statistics 2016
Add column
1.Seasons
2.Races Entered
Constructor
Ferrari
Engine Country Base
Force India
Haas
Ferrari
Mercedes
Ferrari
Italy
India
US
Italy
UK
US & UK
Manor Mercedes UK UK
Column popula+on
Suggesting column
labels to be added
as heading columns
ASSISTANCE #3
Oscar Best Actor
Year
2013
Actor Film Role(s)
2014
2015
Matthew McConaughey
Eddie Redmayne
Leonard DiCaprio
Dallas Buyers Club
The theory of Everything
The Revenant
Ron Woodroof
Stephen Hawking
Hugh Class
2016 Casey Affleck Manchester by the Sea Lee Chandler
2017 Gary Oldman
1.Darkest Hour
https://en.wikipedia.org/wiki/Academy_Award_for_Best_Actor
(2 additional sources)
2.Tinker Tailor Soldier Spy
https://en.wikipedia.org/wiki/Academy_Award_for_Best_Actor
(1 additional source)
3.Nil by Mouth
http://dbpedia.org/page/Gary_Oldman
1.Lee Chandler
https://en.wikipedia.org/wiki/Academy_Award_for_Best_Actor
https://en.wikipedia.org/wiki/Casey_Affleck
2.Ray Sybert
https://en.wikipedia.org/wiki/Casey_Affleck
Value finding
Suggesting values for
specific table cells with
supporting evidence
Value checking
Checking existing cell
values whether there is
supporting evidence
ASSISTANCE #4
Singapore Search
Year
GDP
Nominal
(Billion)
GDP
Nominal
Per Capita
GDP Real
(Billion)
Singapore - Wikipedia, Economy Statistics (Recent Years)
GNI
Nominal
(Billion)
GNI
Nominal
Per Capita
2011 S$346.353 S$66,816 S$342.371 S$338.452 S$65,292
https://en.wikipedia.org/wiki/Singapore
Show more (5 rows total)
Singapore - Wikipedia, Language used most frequently at home
https://en.wikipedia.org/wiki/Singapore
2012 S$362.332 S$68,205 S$354.061 S$351.765 S$66,216
2013 S$378.200 S$70,047 S$324.592 S$366.618 S$67,902
Query: economy of Singapore
Table genera+on
Automatically generating
an entire table in response
to a keyword query
EXPERIMENTAL SETTING
• Data sources
• Table corpus: 1.6M tables extracted from Wikipedia
• Knowledge base: DBpedia 2015-10 (4.6M entities)
• Evaluation measures
• Standard IR measures (MAP, MRR, NDCG)
#1 ROW POPULATION
• Task: Generate a ranked list of entities to be added to the core
column of a given seed table
S. Zhang and K. Balog. EntiTables: Smart Assistance for Entity-Focused Tables. 

In: 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17)
l1 l2 ... lm
e1
e2
…
en
en+1
Table caption c
Seed entities
E=(e1,…,en)
Seed column labels
L=(l1,…,lm)
?
Seed table
APPROACH
Seed table
Candidate
selec+on
1
En+ty 

ranking
2
Ranked list 

of sugges+on

(top-K enSSes)
APPROACH: CANDIDATE SELECTION
• From knowledge base
• Entities that are of the same type(s) or belong to the same categories
• Ranking is based on the number of shared types/categories
• From table corpus
• Based on caption: indexing the table as a document and using a standard
document retrieval method (BM25)
• Based on entities: indexing only entities, using seed entities as the query
Seed table
Candidate
selec+on
1
En+ty 

ranking
2
Ranked list 

of sugges+on

(top-K enSSes)
APPROACH: ENTITY RANKING
• Based on the similarity between the candidate entity and
various table elements
Candidate
selec+on
1
Ranked list 

of sugges+on

(top-K enSSes)
En+ty 

ranking
2
P(e|E, L, c) = · · · / P(e|E)P(L|e)P(c|e)
En+ty similarity
Column label similarity
Cap+on similarity
Candidate en+ty
Seed table
EXPERIMENTAL DESIGN
• Idea: Take existing tables and simulate the user
in an intermediate step during table completion
• Select a set of (1000) tables randomly
• Contain at least 6 rows and at least 3 columns (in
addition to the subject column)
• For any intermediate step (i rows completed)
• First i (1<=i<=5) rows are taken as the seed table
• Entities in the remaining rows are the ground truth
l1 l2 lm
e1
…
ei
ei+1
…
en
Seed table
Ground truth
EXPERIMENTAL RESULTS
Method
#Seed entities
1 2 3 4 5
Baseline* 0.307 0.327 0.340 0.342 0.340
Entity similarity 0.490 0.542 0.561 0.566 0.560
+ column label similarity 0.572 0.610 0.618 0.618 0.610
+ caption similarity 0.592 0.626 0.633 0.634 0.631
Entity ranking performance in terms of Mean Average Precision (MAP)
* M.Bron, K. Balog, and M. de Rijke. Example Based Entity Search in the Web of Data. 

In: 34th European Conference on Information Retrieval (ECIR ’13)
#2 COLUMN POPULATION
• Task: generate a ranked list of entities to be added to the core
column of a given seed table
S. Zhang and K. Balog. EntiTables: Smart Assistance for Entity-Focused Tables. 

In: 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17)
l1 l2 ... lm lm+1
e1
e2
…
en
Table caption c ?Seed table
EXPERIMENTAL DESIGN
• Idea: Take existing tables and simulate the user
in an intermediate step during table completion
• Select a set of (1000) tables randomly
• Contain at least 6 rows and at least 4 columns
• For any intermediate step (j columns completed)
• First j (1<=j<=3) columns are taken as the seed table
• Labels in the remaining columns are the ground truth
l1 ... lj lj+1 ... lm
e1
e2
…
en
Seed table Ground truth
#3 CELL VALUE FINDING
• Task: Given an input relational table, find the value of a specific cell
(identified by the entity in the core column and the column heading
label) or (optionally) determine if the cell should be left empty
S. Zhang and K. Balog. Auto-completion for Data Cells in Relational Tables. 

In: 28th ACM International Conference on Information and Knowledge Management (CIKM ’19)
l
e
Table caption c
?
APPROACH
Input table
Candidate
value finding
1
Value 

ranking
2
Ranked list 

of sugges+on

(top-K values)
?
APPROACH: 

CANDIDATE VALUE FINDING
• From knowledge base
• Heading-to-predicate matching
• E.g., "location" vs. <dbp:location>, <dbp:city>, <dbp:country>
• From table corpus
• Heading-to-heading matching
• Identify other table columns that have the same meaning
1
Input table
?
Candidate
value finding
Value 

ranking
2
Ranked list 

of sugges+on

(top-K values)
APPROACH: VALUE RANKING
• Combine evidence in a feature-based approach
• Features I: Degree of support for the given value across the
different evidence sources
• Features II: Empty value prediction
• Features III: Semantic relatedness between the input table and
candidate tables (where the value originates from)
2
Input table
Candidate
value finding
1
Value 

ranking
Ranked list 

of sugges+on

(top-K values)
?
EXPERIMENTAL DESIGN
• Idea: Conceal cell values from existing
tables
• Randomly select an existing table
• Pick a table column
• Remove n cells randomly from this column
• Evaluate using crowdsourcing
• Given the input table, the value, and a
source document, does this appear as the
correct value for the missing cell?
... ... ... ...
... ... ...
... ... ... ...
... ... ...
... ... ...
... ... ... ...
... ... ... ...
EXPERIMENTAL RESULTS
Method
Empy values
excluded
Empy values
included
Baseline 0.585 0.518
Features I 0.664 0.576
Features I+II 0.684 0.590
Features I+II+III 0.757 0.671
Value finding performance in terms of NDCG@5
#4 ON-THE-FLY TABLE GENERATION
• Task: Answer a free text query with a relational table, where
• the core column lists all relevant entities
• columns correspond to attributes of those entities
• cells contain the values of the corresponding entity attributes
L
E V
Keyword query q
S. Zhang and K. Balog. On-the-fly Table Generation. 

In: 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18)
APPROACH Core column entity ranking
and schema determination
could potentially mutually
reinforce each other.
Query
(q)
E
Core column
en+ty ranking
Schema
determina+on
S
Value lookup
V
E
S
ALGORITHM
Query
(q)
E
Core column
en+ty ranking
Schema
determina+on
Value lookup
E
S
S
V
MAIN RANKING SIGNALS
Query
(q)
E
Core column
en+ty ranking
Schema
determina+on
Value lookup
E
S
S
V
• Query-only
• Term-based matching
• Semantic matching
• Query + schema
• Entity-schema matching
• Entity-schema compatibility
• Query-only
• Column population (q)
• Semantic matching
• Query + entities
• Column population (q, E)
• Attribute retrieval
• Entity-schema compatibility
EXPERIMENTAL DESIGN
• QS-1: List queries from the DBpedia-Entity v2 collection1 (119)
• Relevance judgments obtained via crowdsourcing
• "all cars that are produced in Germany"
• "permanent members of the UN Security Council"
• "Airlines that currently use Boeing 747 planes"
• QS-2: Entity-relationship queries from the RELink Query Collection2 (600)
• Queries and relevance judgments obtained automatically from Wikipedia lists that contain
relational tables
• "find peaks above 6000m in the mountains of Peru"
• "Which countries and cities have accredited Armenian ambassadors?"
• "Which anti-aircraft guns were used in ships during war periods and what country produced them?"
1 Hasibi et al. DBpedia-Entity v2: A Test Collection for Entity Search. In: SIGIR ’17.
2 Saleiro et al. RELink: A Research Framework and Test Collection for Entity-Relationship Retrieval. In: SIGIR ’17.
EXPERIMENTAL RESULTS
(QS-1)
Core column entity ranking Schema determination
without schema
information
(query only)
with ground
truth schema
with automatic
schema determination
without entity
information
(query only)
with ground
truth entities
with automatic core
column entity ranking
SUMMARY OF PART II
• Tables are a universal tool for collecting and manipulating data
• A selection of smart assistance functionalities for relational tables
• Open issues
• Moving from homogeneous Wikipedia tables to heterogeneous Web tables
and to other (non-relational) table types
• Tapping into unstructured data sources
• Additional operations, e.g., filtering ("above 6000m") and sorting ("by
population")
• User-centric evaluation in the context of a larger work task
HIGH-QUALITY DATA 

IS THE KEY ENABLER
TRENDS IN THE IR LITERATURE
0
10
20
30
40
2000 2002 2004 2006 2008 2010 2012 2014 2016
entity OR entities Wikipedia
knowledge base knowledge graph
Numbers are based on boolean queries on paper titles from SIGIR, ECIR, CIKM, WSDM, and WWW
TRENDS IN THE IR LITERATURE
Numbers are based on boolean queries on paper titles from SIGIR, ECIR, CIKM, WSDM, and WWW
0
10
20
30
40
2000 2002 2004 2006 2008 2010 2012 2014 2016
entity OR entities
Wikipedia OR "knowledge base" OR "knowledge graph"
MAINTAINING AND POPULATING
KNOWLEDGE BASES
Part III
KNOWLEDGE BASES LAG BEHIND
• Many intelligent information access tasks are enabled by
knowledge bases
• Increasingly difficult to keep up with changes and ensure that
knowledge bases are up-to-date and reliable
• Work that needs to be performed by human editors
Can we help human editors to maintain and expand
knowledge bases?
KNOWLEDGE BASE ACCELERATION
Human editor
Entity-centric
document filtering
Entity attribute
extraction
Entity KB entry
time
Content stream
ranked list of
documents
entity
facts
KBA system
edits
Knowledge base
KNOWLEDGE BASE ACCELERATION
Human editor
Entity-centric
document filtering
Entity attribute
extraction
Entity KB entry
time
Content stream
ranked list of
documents
entity
facts
KBA system
edits
Knowledge base
Task: Analyze a
stream of documents
and assign a score to
each document based
on how relevant it is to
a given target entity
ENVISAGED TOOL
K. Balog, H. Ramampiaro, and K. Nørvåg. KBAAA: A Web-based Toolkit for the Assessment and Analysis of Knowledge Base Acceleration
Systems. In: 10th Conference on Open Research Areas in Information Retrieval (OAIR ’13)
APPROACH
Document
Men+on
detec+on
1
Document
scoring
2
Relevance score
K. Balog, N. Takhirov, H. Ramampiaro, and K. Nørvåg. Multi-step Classification Approaches to Cumulative Citation Recommendation. 

In: 10th Conference on Open Research Areas in Information Retrieval (OAIR ’13)
0.86
APPROACH: MENTION DETECTION
Document
Men+on
detec+on
1
Document
scoring
2
Relevance score
K. Balog, N. Takhirov, H. Ramampiaro, and K. Nørvåg. Multi-step Classification Approaches to Cumulative Citation Recommendation. 

In: 10th Conference on Open Research Areas in Information Retrieval (OAIR ’13)
0.86
• Objectives
• High recall, at the same time keep the false positive rate low
• Efficiency (need to be performed on all documents)
• Based on known surface forms of the entity
• No entity disambiguation performed
APPROACH: DOCUMENT SCORING
Document
Men+on
detec+on
1
Document
scoring
2
Relevance score
K. Balog, N. Takhirov, H. Ramampiaro, and K. Nørvåg. Multi-step Classification Approaches to Cumulative Citation Recommendation. 

In: 10th Conference on Open Research Areas in Information Retrieval (OAIR ’13)
0.86
• Document features
• Entity features
• Document-entity features
• E.g., occurrences and spread of entity and related entities in the document
• Temporal features
• E.g., bursts in document stream or in entity profile views in KB
EXPERIMENTAL SETUP
• TREC Knowledge Base Acceleration
track (2012 edition)
• KBA stream corpus
• Oct 2011—Apr 2012
• Three sources: news, social, linking
• Raw data 8.7TB
• Target entities are from Wikipedia
• Precision and recall measured as a
function of cutoff
1328055120'f6462409e60d2748a0adef82fe68b86d
1328057880'79cdee3c9218ec77f6580183cb16e045
1328057280'80fb850c089caa381a796c34e23d9af8
1328056560'450983d117c5a7903a3a27c959cc682a
1328056560'450983d117c5a7903a3a27c959cc682a
1328056260'684e2f8fc90de6ef949946f5061a91e0
1328056560'be417475cca57b6557a7d5db0bbc6959
1328057520'4e92eb721bfbfdfa0b1d9476b1ecb009
1328058660'807e4aaeca58000f6889c31c24712247
1328060040'7a8c209ad36bbb9c946348996f8c616b
1328063280'1ac4b6f3a58004d1596d6e42c4746e21
1328064660'1a0167925256b32d715c1a3a2ee0730c
1328062980'7324a71469556bcd1f3904ba090ab685
PositiveNegative
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
score
Target entity: Aharon Barak
urlname stream_id
Cutoff
1000
500
500
480
450
430
428
428
380
380
375
315
263
1328055120'f6462409e60d2748a0adef82fe68b86d
1328057880'79cdee3c9218ec77f6580183cb16e045
1328057280'80fb850c089caa381a796c34e23d9af8
1328056560'450983d117c5a7903a3a27c959cc682a
1328056560'450983d117c5a7903a3a27c959cc682a
1328056260'684e2f8fc90de6ef949946f5061a91e0
1328056560'be417475cca57b6557a7d5db0bbc6959
1328057520'4e92eb721bfbfdfa0b1d9476b1ecb009
1328058660'807e4aaeca58000f6889c31c24712247
1328060040'7a8c209ad36bbb9c946348996f8c616b
1328063280'1ac4b6f3a58004d1596d6e42c4746e21
1328064660'1a0167925256b32d715c1a3a2ee0730c
1328062980'7324a71469556bcd1f3904ba090ab685
PositiveNegative
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
Aharon_Barak
RESULTS
• Features that worked well
• #related entities, stream volume, Wikipedia pageviews
• Similarity between the doc and the entity’s Wikipedia page
• #entity mentions and spread in the document body
• Features that didn't work that well
• Temporal features
• Separating 'relevant' and 'vitally relevant' is difficult!
KNOWLEDGE BASE ACCELERATION
Human editor
Entity-centric
document filtering
Entity attribute
extraction
Entity KB entry
time
Content stream
ranked list of
documents
entity
facts
KBA system
edits
Knowledge base
Task: Extract the corresponding values
for a pre-defined set of predicates, for a
given target entity, from a previously
identified set of documents
ENVISAGED TOOL
J. Benetka, K. Balog, and K. Nørvåg. Towards Building a Knowledge Base of Monetary Transactions from a News Collection. 

In: 17th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’17)
acquisitionFinancial event:OracleSubject: Find events
InsertConfidence
2004
NYT
USD 10 300 000 000
Value
NYT
Year
56%
2007
USD 1 500 000
… from the PeopleSoft purchase …
2005 NYT
2004
NYT
Snippet
NYT
82.8% …Oracle finally acquired PeopleSoft for…
pleSoft finally capitulated to Oracle's …
Link
2004
… which acquired PeopleSoft last year …
USD 11
75.3% USD 20 000 000 000
78.9%
66.7% PeopleSoft for $5.1 billion in cash.
USD 7 700 000 000
Counterpart Event attributes
Hyperion Solutions
Siebel Systems
Retek
PeopleSoft
Subject en+ty Predicate filter
Object en+ty
Extracted informa+on
A Boom in Merger Activity
In December 2004, after a
battle for control that grew
nasty, Oracle finally acquired
PeopleSoft for about $10.3
billion, becoming the second-
largest maker of business-
management software.
APPROACH
• Generate all possible event
interpretations (quintuples)
Event representa+on
• Monetary value recognition
• Economic event recognition
• Entity recognition
• Date extraction
• Semantic role labeling
Seman+c annota+on of sentences
• Grouping sentences that discuss
the same economic event
Clustering events
• Assigning confidence score to
each interpretation
Supervised learning
s#1
s#2
s#3
s#4
s#5
s#1
s#1
s#2
s#5
s#3
s#4
0.85
0.65
0.91
0.43
0.45
0.77
1
2 3
4
s#1
s#2
s#5
A B
A B
A B
s#3
s#4
C D
C D
e#1
[C] <rel> [D]
e#2
[A] <rel> [B]
{
{
EXPERIMENTAL SETUP
• New York Times Annotated Corpus
• 20 years, 1.8M articles
• Entity repository constructed from three sources
• DBpedia, Freebase, and CrunchBase
• Test set comprises 30 companies
• 132 ground truth events in total
RESULTS
F1
0
0,1
0,2
0,3
0,4
Events AYributes (strict) AYributes (relaxed)
First reporSng Last reporSng Most frequent Supervised learning
SUMMARY OF PART III
• Techniques for identifying documents that could potentially
trigger updates to the entry of an entity in a knowledge base
• Domain-specific adaptation of an NLP+ML pipeline for attribute
extraction
• Open issues
• Novel entity discovery
• Attributes of interest
• Facts vs. claims
• Generic vs. domain specific techniques
SUMMARY
• Complex information needs will continue to require human intelligence,
but there is a growing array of tools to assist them
• Entity-oriented perspective on information access
• Equipping spreadsheet programs with smart assistance capabilities
• Tool support for knowledge editors for maintaining and expanding knowledge bases
• Open issues
• Pipeline approaches vs. end-to-end learning
• Techniques for long-tail and emerging entities
• Domain-specific adaptations
• User-centric evaluation in an actual task context
JOINT WORK WITH
• Jan Benetka, Faegheh Hasibi, Kjetil Nørvåg, Heri Ramampiaro,
Naimdjon Takhirov, Shuo Zhang
THANK YOU! www.eos-book.org
OPEN ACCESS
@krisztianbalog
krisztianbalog.com

More Related Content

What's hot

Social Network Analysis, Semantic Web and Learning Networks
Social Network Analysis, Semantic Web and Learning NetworksSocial Network Analysis, Semantic Web and Learning Networks
Social Network Analysis, Semantic Web and Learning NetworksRory Sie
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upDavide Palmisano
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphsandyseaborne
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataFabien Gandon
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesBasil Ell
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLFariz Darari
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceTrey Grainger
 

What's hot (12)

Social Network Analysis, Semantic Web and Learning Networks
Social Network Analysis, Semantic Web and Learning NetworksSocial Network Analysis, Semantic Web and Learning Networks
Social Network Analysis, Semantic Web and Learning Networks
 
Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF Ist16-04 An introduction to RDF
Ist16-04 An introduction to RDF
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
 
An introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked DataAn introduction to Semantic Web and Linked Data
An introduction to Semantic Web and Linked Data
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Ontologies in RDF-S/OWL
Ontologies in RDF-S/OWLOntologies in RDF-S/OWL
Ontologies in RDF-S/OWL
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQL
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 

Similar to Entities for Augmented Intelligence

Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generationkrisztianbalog
 
CiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataCiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataJian Wu
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Nextkrisztianbalog
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
 
Slides
SlidesSlides
Slidesbutest
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2Daniel JACOB
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)krisztianbalog
 
Sparklis exploration et interrogation de points d'accès sparql par interactio...
Sparklis exploration et interrogation de points d'accès sparql par interactio...Sparklis exploration et interrogation de points d'accès sparql par interactio...
Sparklis exploration et interrogation de points d'accès sparql par interactio...SemWebPro
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...semanticsconference
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineGan Keng Hoon
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Artificial Intelligence Institute at UofSC
 

Similar to Entities for Augmented Intelligence (20)

Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
 
TRank ISWC2013
TRank ISWC2013TRank ISWC2013
TRank ISWC2013
 
CiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataCiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big Data
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Next
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
A Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF GraphsA Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF Graphs
 
Slides
SlidesSlides
Slides
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Make your data great again - Ver 2
Make your data great again - Ver 2Make your data great again - Ver 2
Make your data great again - Ver 2
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
 
Sparklis exploration et interrogation de points d'accès sparql par interactio...
Sparklis exploration et interrogation de points d'accès sparql par interactio...Sparklis exploration et interrogation de points d'accès sparql par interactio...
Sparklis exploration et interrogation de points d'accès sparql par interactio...
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
Pablo Mendes' Defense: Adaptive Semantic Annotation of Entity and Concept Men...
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 

More from krisztianbalog

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...krisztianbalog
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...krisztianbalog
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Editionkrisztianbalog
 
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF LabOverview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Labkrisztianbalog
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)krisztianbalog
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systemskrisztianbalog
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendationkrisztianbalog
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)krisztianbalog
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seachkrisztianbalog
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Searchkrisztianbalog
 

More from krisztianbalog (13)

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Edition
 
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF LabOverview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systems
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seach
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Search
 

Recently uploaded

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 

Recently uploaded (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 

Entities for Augmented Intelligence

  • 1. ENTITIES FOR AUGMENTED INTELLIGENCE Krisztian Balog University of Stavanger
 @krisztianbalog Keynote given at the 23rd Interna+onal Conference on Theory and Prac+ce of Digital Libraries (TPDL '19) | Oslo, Norway, September 2019
  • 3.
  • 4. WHAT IS AN ENTITY? An entity is a uniquely identifiable object or thing, characterized by its name(s), type(s), attributes, and relationships to other entities.
  • 5. AN ENTITY <dbr:Roger_Needham> <dbo:Scientist> <dbo:Person> <dbo:Agent> <owl:Thing> <rdf:type> <dbo:abstract> "1935-08-26" "Karen Spärck Jones" <foaf:name> <dbo:spouse> <University_of_Cambridge> <dbp:almaMater> <dbr:Natural_language_processing> <dbo:knownFor> <dbc:Information_retrieval_researchers> <dct:subject> <dbc:British_women_computer_scientists> <dbc:British_computer_scientists> <dbc:British_women_scientists> "Karen Spärck Jones FBA (26 August 1935 – 4 April 2007) was a British computer scientist." <dbr:Karen_Spark_Jones> <dbo:birthDate>
  • 6. REPRESENTING ENTITIES 
 AND THEIR PROPERTIES entity catalog entity ID* name(s)*
  • 7. REPRESENTING ENTITIES 
 AND THEIR PROPERTIES entity catalog entity ID* name(s)* knowledge repository type(s)* descriptions relationships (non-typed links)
  • 8. REPRESENTING ENTITIES 
 AND THEIR PROPERTIES entity catalog entity ID* name(s)* knowledge repository type(s)* descriptions relationships (non-typed links) knowledge base (KB) /
 knowledge graph (KG) attributes relationships (typed links)
  • 9. REPRESENTING ENTITIES 
 AND THEIR PROPERTIES entity catalog entity ID* name(s)* knowledge repository type(s)* descriptions relationships (non-typed links) knowledge base (KB) /
 knowledge graph (KG) attributes relationships (typed links) Meant for human consump+on Meant for machine consump+on
  • 10. WHY CARE ABOUT ENTITIES? • From a user perspective, entities ... • are natural units for organizing information • enable a richer and more effective user experience • From a machine perspective, entities ... • allow for a better understanding of queries, document content, and of users • help to bridge the gap between unstructured and structured data • enable search engines to be more intelligent
  • 11. TWO CORE COMPONENTS: ENTITY RETRIEVAL & ENTITY LINKING Part I
  • 12. ENTITY RETRIEVAL • Task: Answer an information need (expressed, e.g., as a free text query) with a ranked list of entities from some catalog of entities e1 e2 … en Information need
  • 13. NUMEROUS APPLICATIONS movie recommendation playlist completione-commerce search
  • 14. APPROACHES • Term-based entity representations can be effectively ranked using document-based retrieval models • Semantically informed retrieval models utilize entity-specific properties (attributes, types, and relationships)
  • 15. ENTITY LINKING • Task: Recognize entity mentions in text and link them to the corresponding entries in a knowledge repository Michael Schumacher (born 3 January 1969) is a German retired racing driver. He is a seven-time Formula One World Champion and is widely regarded as one of the greatest Formula One drivers of all time. He won two titles with Benetton in 1994 and 1995 before moving to Ferrari where he drove for eleven years. His time with Ferrari yielded five consecutive titles between 2000 and 2004. Michael Schumacher Schuderia Ferrari Benetton Formula Racing driver Formula One constructor Formula One constructor Formula One Auto racing series
  • 17. SUMMARY OF PART I • Established entity retrieval and entity linking techniques provide a solid starting point • Open issues • Most work on entity retrieval has focused on keyword queries; there are numerous other ways of expressing information needs • Different types of input calls for different entity linking techniques • Noisy short texts (e.g., tweets, queries), structured data (e.g., tables), OCR'ed text, ... • Long tail entities (with sparse representation)
  • 18. SEARCH IS OFTEN PART OF A LARGER WORK TASK
  • 19. EXAMPLE INFORMATION NEEDS • Planning a road trip in California • Creating a curriculum for a course (including recommended literature and invited speakers) • Finding out which anti-aircraft guns were used in ships during war periods, what countries produced them, and if any working models may be found (and where) Answering complex information needs involves retrieving, extracting, filtering, and aggregating information from multiple sources
  • 20. SMART ASSISTANCE FOR TABLES Part II
  • 22. Formula 1 constructors’ statistics 2016 Constructor Ferrari Engine Country Base Force India Haas Ferrari Mercedes Ferrari Italy India US Italy UK US & UK Manor Mercedes UK UK … … Table cap+on THE ANATOMY OF A RELATIONAL 
 (ENTITY-FOCUSED) TABLE Table en++es (core/subject column) Heading column labels (table schema) Table data
  • 23. WHAT KIND OF ASSISTANCE CAN WE PROVIDE FOR PEOPLE WORKING WITH (RELATIONAL) TABLES?
  • 25. SMART ASSISTANCE Sometimes I just pop up for no particular reason
  • 26.
  • 27. ASSISTANCE #1 Formula 1 constructors’ statistics 2016 1.McLaren 2.Mercedes 3.Red Bull Add entity Constructor Ferrari Engine Country Base Force India Haas Ferrari Mercedes Ferrari Italy India US Italy UK US & UK Manor Mercedes UK UK Row popula+on Suggesting entities to be added to the subject column of the table
  • 28. ASSISTANCE #2 Formula 1 constructors’ statistics 2016 Add column 1.Seasons 2.Races Entered Constructor Ferrari Engine Country Base Force India Haas Ferrari Mercedes Ferrari Italy India US Italy UK US & UK Manor Mercedes UK UK Column popula+on Suggesting column labels to be added as heading columns
  • 29. ASSISTANCE #3 Oscar Best Actor Year 2013 Actor Film Role(s) 2014 2015 Matthew McConaughey Eddie Redmayne Leonard DiCaprio Dallas Buyers Club The theory of Everything The Revenant Ron Woodroof Stephen Hawking Hugh Class 2016 Casey Affleck Manchester by the Sea Lee Chandler 2017 Gary Oldman 1.Darkest Hour https://en.wikipedia.org/wiki/Academy_Award_for_Best_Actor (2 additional sources) 2.Tinker Tailor Soldier Spy https://en.wikipedia.org/wiki/Academy_Award_for_Best_Actor (1 additional source) 3.Nil by Mouth http://dbpedia.org/page/Gary_Oldman 1.Lee Chandler https://en.wikipedia.org/wiki/Academy_Award_for_Best_Actor https://en.wikipedia.org/wiki/Casey_Affleck 2.Ray Sybert https://en.wikipedia.org/wiki/Casey_Affleck Value finding Suggesting values for specific table cells with supporting evidence Value checking Checking existing cell values whether there is supporting evidence
  • 30. ASSISTANCE #4 Singapore Search Year GDP Nominal (Billion) GDP Nominal Per Capita GDP Real (Billion) Singapore - Wikipedia, Economy Statistics (Recent Years) GNI Nominal (Billion) GNI Nominal Per Capita 2011 S$346.353 S$66,816 S$342.371 S$338.452 S$65,292 https://en.wikipedia.org/wiki/Singapore Show more (5 rows total) Singapore - Wikipedia, Language used most frequently at home https://en.wikipedia.org/wiki/Singapore 2012 S$362.332 S$68,205 S$354.061 S$351.765 S$66,216 2013 S$378.200 S$70,047 S$324.592 S$366.618 S$67,902 Query: economy of Singapore Table genera+on Automatically generating an entire table in response to a keyword query
  • 31. EXPERIMENTAL SETTING • Data sources • Table corpus: 1.6M tables extracted from Wikipedia • Knowledge base: DBpedia 2015-10 (4.6M entities) • Evaluation measures • Standard IR measures (MAP, MRR, NDCG)
  • 32. #1 ROW POPULATION • Task: Generate a ranked list of entities to be added to the core column of a given seed table S. Zhang and K. Balog. EntiTables: Smart Assistance for Entity-Focused Tables. 
 In: 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17) l1 l2 ... lm e1 e2 … en en+1 Table caption c Seed entities E=(e1,…,en) Seed column labels L=(l1,…,lm) ? Seed table
  • 34. APPROACH: CANDIDATE SELECTION • From knowledge base • Entities that are of the same type(s) or belong to the same categories • Ranking is based on the number of shared types/categories • From table corpus • Based on caption: indexing the table as a document and using a standard document retrieval method (BM25) • Based on entities: indexing only entities, using seed entities as the query Seed table Candidate selec+on 1 En+ty 
 ranking 2 Ranked list 
 of sugges+on
 (top-K enSSes)
  • 35. APPROACH: ENTITY RANKING • Based on the similarity between the candidate entity and various table elements Candidate selec+on 1 Ranked list 
 of sugges+on
 (top-K enSSes) En+ty 
 ranking 2 P(e|E, L, c) = · · · / P(e|E)P(L|e)P(c|e) En+ty similarity Column label similarity Cap+on similarity Candidate en+ty Seed table
  • 36. EXPERIMENTAL DESIGN • Idea: Take existing tables and simulate the user in an intermediate step during table completion • Select a set of (1000) tables randomly • Contain at least 6 rows and at least 3 columns (in addition to the subject column) • For any intermediate step (i rows completed) • First i (1<=i<=5) rows are taken as the seed table • Entities in the remaining rows are the ground truth l1 l2 lm e1 … ei ei+1 … en Seed table Ground truth
  • 37. EXPERIMENTAL RESULTS Method #Seed entities 1 2 3 4 5 Baseline* 0.307 0.327 0.340 0.342 0.340 Entity similarity 0.490 0.542 0.561 0.566 0.560 + column label similarity 0.572 0.610 0.618 0.618 0.610 + caption similarity 0.592 0.626 0.633 0.634 0.631 Entity ranking performance in terms of Mean Average Precision (MAP) * M.Bron, K. Balog, and M. de Rijke. Example Based Entity Search in the Web of Data. 
 In: 34th European Conference on Information Retrieval (ECIR ’13)
  • 38. #2 COLUMN POPULATION • Task: generate a ranked list of entities to be added to the core column of a given seed table S. Zhang and K. Balog. EntiTables: Smart Assistance for Entity-Focused Tables. 
 In: 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17) l1 l2 ... lm lm+1 e1 e2 … en Table caption c ?Seed table
  • 39. EXPERIMENTAL DESIGN • Idea: Take existing tables and simulate the user in an intermediate step during table completion • Select a set of (1000) tables randomly • Contain at least 6 rows and at least 4 columns • For any intermediate step (j columns completed) • First j (1<=j<=3) columns are taken as the seed table • Labels in the remaining columns are the ground truth l1 ... lj lj+1 ... lm e1 e2 … en Seed table Ground truth
  • 40. #3 CELL VALUE FINDING • Task: Given an input relational table, find the value of a specific cell (identified by the entity in the core column and the column heading label) or (optionally) determine if the cell should be left empty S. Zhang and K. Balog. Auto-completion for Data Cells in Relational Tables. 
 In: 28th ACM International Conference on Information and Knowledge Management (CIKM ’19) l e Table caption c ?
  • 41. APPROACH Input table Candidate value finding 1 Value 
 ranking 2 Ranked list 
 of sugges+on
 (top-K values) ?
  • 42. APPROACH: 
 CANDIDATE VALUE FINDING • From knowledge base • Heading-to-predicate matching • E.g., "location" vs. <dbp:location>, <dbp:city>, <dbp:country> • From table corpus • Heading-to-heading matching • Identify other table columns that have the same meaning 1 Input table ? Candidate value finding Value 
 ranking 2 Ranked list 
 of sugges+on
 (top-K values)
  • 43. APPROACH: VALUE RANKING • Combine evidence in a feature-based approach • Features I: Degree of support for the given value across the different evidence sources • Features II: Empty value prediction • Features III: Semantic relatedness between the input table and candidate tables (where the value originates from) 2 Input table Candidate value finding 1 Value 
 ranking Ranked list 
 of sugges+on
 (top-K values) ?
  • 44. EXPERIMENTAL DESIGN • Idea: Conceal cell values from existing tables • Randomly select an existing table • Pick a table column • Remove n cells randomly from this column • Evaluate using crowdsourcing • Given the input table, the value, and a source document, does this appear as the correct value for the missing cell? ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
  • 45. EXPERIMENTAL RESULTS Method Empy values excluded Empy values included Baseline 0.585 0.518 Features I 0.664 0.576 Features I+II 0.684 0.590 Features I+II+III 0.757 0.671 Value finding performance in terms of NDCG@5
  • 46. #4 ON-THE-FLY TABLE GENERATION • Task: Answer a free text query with a relational table, where • the core column lists all relevant entities • columns correspond to attributes of those entities • cells contain the values of the corresponding entity attributes L E V Keyword query q S. Zhang and K. Balog. On-the-fly Table Generation. 
 In: 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '18)
  • 47. APPROACH Core column entity ranking and schema determination could potentially mutually reinforce each other. Query (q) E Core column en+ty ranking Schema determina+on S Value lookup V E S
  • 49. MAIN RANKING SIGNALS Query (q) E Core column en+ty ranking Schema determina+on Value lookup E S S V • Query-only • Term-based matching • Semantic matching • Query + schema • Entity-schema matching • Entity-schema compatibility • Query-only • Column population (q) • Semantic matching • Query + entities • Column population (q, E) • Attribute retrieval • Entity-schema compatibility
  • 50. EXPERIMENTAL DESIGN • QS-1: List queries from the DBpedia-Entity v2 collection1 (119) • Relevance judgments obtained via crowdsourcing • "all cars that are produced in Germany" • "permanent members of the UN Security Council" • "Airlines that currently use Boeing 747 planes" • QS-2: Entity-relationship queries from the RELink Query Collection2 (600) • Queries and relevance judgments obtained automatically from Wikipedia lists that contain relational tables • "find peaks above 6000m in the mountains of Peru" • "Which countries and cities have accredited Armenian ambassadors?" • "Which anti-aircraft guns were used in ships during war periods and what country produced them?" 1 Hasibi et al. DBpedia-Entity v2: A Test Collection for Entity Search. In: SIGIR ’17. 2 Saleiro et al. RELink: A Research Framework and Test Collection for Entity-Relationship Retrieval. In: SIGIR ’17.
  • 51. EXPERIMENTAL RESULTS (QS-1) Core column entity ranking Schema determination without schema information (query only) with ground truth schema with automatic schema determination without entity information (query only) with ground truth entities with automatic core column entity ranking
  • 52. SUMMARY OF PART II • Tables are a universal tool for collecting and manipulating data • A selection of smart assistance functionalities for relational tables • Open issues • Moving from homogeneous Wikipedia tables to heterogeneous Web tables and to other (non-relational) table types • Tapping into unstructured data sources • Additional operations, e.g., filtering ("above 6000m") and sorting ("by population") • User-centric evaluation in the context of a larger work task
  • 53. HIGH-QUALITY DATA 
 IS THE KEY ENABLER
  • 54. TRENDS IN THE IR LITERATURE 0 10 20 30 40 2000 2002 2004 2006 2008 2010 2012 2014 2016 entity OR entities Wikipedia knowledge base knowledge graph Numbers are based on boolean queries on paper titles from SIGIR, ECIR, CIKM, WSDM, and WWW
  • 55. TRENDS IN THE IR LITERATURE Numbers are based on boolean queries on paper titles from SIGIR, ECIR, CIKM, WSDM, and WWW 0 10 20 30 40 2000 2002 2004 2006 2008 2010 2012 2014 2016 entity OR entities Wikipedia OR "knowledge base" OR "knowledge graph"
  • 57. KNOWLEDGE BASES LAG BEHIND • Many intelligent information access tasks are enabled by knowledge bases • Increasingly difficult to keep up with changes and ensure that knowledge bases are up-to-date and reliable • Work that needs to be performed by human editors Can we help human editors to maintain and expand knowledge bases?
  • 58. KNOWLEDGE BASE ACCELERATION Human editor Entity-centric document filtering Entity attribute extraction Entity KB entry time Content stream ranked list of documents entity facts KBA system edits Knowledge base
  • 59. KNOWLEDGE BASE ACCELERATION Human editor Entity-centric document filtering Entity attribute extraction Entity KB entry time Content stream ranked list of documents entity facts KBA system edits Knowledge base Task: Analyze a stream of documents and assign a score to each document based on how relevant it is to a given target entity
  • 60. ENVISAGED TOOL K. Balog, H. Ramampiaro, and K. Nørvåg. KBAAA: A Web-based Toolkit for the Assessment and Analysis of Knowledge Base Acceleration Systems. In: 10th Conference on Open Research Areas in Information Retrieval (OAIR ’13)
  • 61. APPROACH Document Men+on detec+on 1 Document scoring 2 Relevance score K. Balog, N. Takhirov, H. Ramampiaro, and K. Nørvåg. Multi-step Classification Approaches to Cumulative Citation Recommendation. 
 In: 10th Conference on Open Research Areas in Information Retrieval (OAIR ’13) 0.86
  • 62. APPROACH: MENTION DETECTION Document Men+on detec+on 1 Document scoring 2 Relevance score K. Balog, N. Takhirov, H. Ramampiaro, and K. Nørvåg. Multi-step Classification Approaches to Cumulative Citation Recommendation. 
 In: 10th Conference on Open Research Areas in Information Retrieval (OAIR ’13) 0.86 • Objectives • High recall, at the same time keep the false positive rate low • Efficiency (need to be performed on all documents) • Based on known surface forms of the entity • No entity disambiguation performed
  • 63. APPROACH: DOCUMENT SCORING Document Men+on detec+on 1 Document scoring 2 Relevance score K. Balog, N. Takhirov, H. Ramampiaro, and K. Nørvåg. Multi-step Classification Approaches to Cumulative Citation Recommendation. 
 In: 10th Conference on Open Research Areas in Information Retrieval (OAIR ’13) 0.86 • Document features • Entity features • Document-entity features • E.g., occurrences and spread of entity and related entities in the document • Temporal features • E.g., bursts in document stream or in entity profile views in KB
  • 64. EXPERIMENTAL SETUP • TREC Knowledge Base Acceleration track (2012 edition) • KBA stream corpus • Oct 2011—Apr 2012 • Three sources: news, social, linking • Raw data 8.7TB • Target entities are from Wikipedia • Precision and recall measured as a function of cutoff 1328055120'f6462409e60d2748a0adef82fe68b86d 1328057880'79cdee3c9218ec77f6580183cb16e045 1328057280'80fb850c089caa381a796c34e23d9af8 1328056560'450983d117c5a7903a3a27c959cc682a 1328056560'450983d117c5a7903a3a27c959cc682a 1328056260'684e2f8fc90de6ef949946f5061a91e0 1328056560'be417475cca57b6557a7d5db0bbc6959 1328057520'4e92eb721bfbfdfa0b1d9476b1ecb009 1328058660'807e4aaeca58000f6889c31c24712247 1328060040'7a8c209ad36bbb9c946348996f8c616b 1328063280'1ac4b6f3a58004d1596d6e42c4746e21 1328064660'1a0167925256b32d715c1a3a2ee0730c 1328062980'7324a71469556bcd1f3904ba090ab685 PositiveNegative Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak score Target entity: Aharon Barak urlname stream_id Cutoff 1000 500 500 480 450 430 428 428 380 380 375 315 263 1328055120'f6462409e60d2748a0adef82fe68b86d 1328057880'79cdee3c9218ec77f6580183cb16e045 1328057280'80fb850c089caa381a796c34e23d9af8 1328056560'450983d117c5a7903a3a27c959cc682a 1328056560'450983d117c5a7903a3a27c959cc682a 1328056260'684e2f8fc90de6ef949946f5061a91e0 1328056560'be417475cca57b6557a7d5db0bbc6959 1328057520'4e92eb721bfbfdfa0b1d9476b1ecb009 1328058660'807e4aaeca58000f6889c31c24712247 1328060040'7a8c209ad36bbb9c946348996f8c616b 1328063280'1ac4b6f3a58004d1596d6e42c4746e21 1328064660'1a0167925256b32d715c1a3a2ee0730c 1328062980'7324a71469556bcd1f3904ba090ab685 PositiveNegative Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak
  • 65. RESULTS • Features that worked well • #related entities, stream volume, Wikipedia pageviews • Similarity between the doc and the entity’s Wikipedia page • #entity mentions and spread in the document body • Features that didn't work that well • Temporal features • Separating 'relevant' and 'vitally relevant' is difficult!
  • 66. KNOWLEDGE BASE ACCELERATION Human editor Entity-centric document filtering Entity attribute extraction Entity KB entry time Content stream ranked list of documents entity facts KBA system edits Knowledge base Task: Extract the corresponding values for a pre-defined set of predicates, for a given target entity, from a previously identified set of documents
  • 67. ENVISAGED TOOL J. Benetka, K. Balog, and K. Nørvåg. Towards Building a Knowledge Base of Monetary Transactions from a News Collection. 
 In: 17th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’17) acquisitionFinancial event:OracleSubject: Find events InsertConfidence 2004 NYT USD 10 300 000 000 Value NYT Year 56% 2007 USD 1 500 000 … from the PeopleSoft purchase … 2005 NYT 2004 NYT Snippet NYT 82.8% …Oracle finally acquired PeopleSoft for… pleSoft finally capitulated to Oracle's … Link 2004 … which acquired PeopleSoft last year … USD 11 75.3% USD 20 000 000 000 78.9% 66.7% PeopleSoft for $5.1 billion in cash. USD 7 700 000 000 Counterpart Event attributes Hyperion Solutions Siebel Systems Retek PeopleSoft Subject en+ty Predicate filter Object en+ty Extracted informa+on A Boom in Merger Activity In December 2004, after a battle for control that grew nasty, Oracle finally acquired PeopleSoft for about $10.3 billion, becoming the second- largest maker of business- management software.
  • 68. APPROACH • Generate all possible event interpretations (quintuples) Event representa+on • Monetary value recognition • Economic event recognition • Entity recognition • Date extraction • Semantic role labeling Seman+c annota+on of sentences • Grouping sentences that discuss the same economic event Clustering events • Assigning confidence score to each interpretation Supervised learning s#1 s#2 s#3 s#4 s#5 s#1 s#1 s#2 s#5 s#3 s#4 0.85 0.65 0.91 0.43 0.45 0.77 1 2 3 4 s#1 s#2 s#5 A B A B A B s#3 s#4 C D C D e#1 [C] <rel> [D] e#2 [A] <rel> [B] { {
  • 69. EXPERIMENTAL SETUP • New York Times Annotated Corpus • 20 years, 1.8M articles • Entity repository constructed from three sources • DBpedia, Freebase, and CrunchBase • Test set comprises 30 companies • 132 ground truth events in total
  • 70. RESULTS F1 0 0,1 0,2 0,3 0,4 Events AYributes (strict) AYributes (relaxed) First reporSng Last reporSng Most frequent Supervised learning
  • 71. SUMMARY OF PART III • Techniques for identifying documents that could potentially trigger updates to the entry of an entity in a knowledge base • Domain-specific adaptation of an NLP+ML pipeline for attribute extraction • Open issues • Novel entity discovery • Attributes of interest • Facts vs. claims • Generic vs. domain specific techniques
  • 72. SUMMARY • Complex information needs will continue to require human intelligence, but there is a growing array of tools to assist them • Entity-oriented perspective on information access • Equipping spreadsheet programs with smart assistance capabilities • Tool support for knowledge editors for maintaining and expanding knowledge bases • Open issues • Pipeline approaches vs. end-to-end learning • Techniques for long-tail and emerging entities • Domain-specific adaptations • User-centric evaluation in an actual task context
  • 73. JOINT WORK WITH • Jan Benetka, Faegheh Hasibi, Kjetil Nørvåg, Heri Ramampiaro, Naimdjon Takhirov, Shuo Zhang
  • 74. THANK YOU! www.eos-book.org OPEN ACCESS @krisztianbalog krisztianbalog.com