Most of the Linked Data applications currently rely on the use of owl : sameAs for linking ontology instances. However, several studies have noticed multiple misuses of this identity link. These misuses, which are mainly caused by the lack of other well-defined linking alternatives, can lead to erroneous statements or inconsistencies. We propose in this paper a new contextual identity link: identiConTo that could serve as a replacement for owl : sameAs in linking identical instances in a specified context. To detect these contextual links, we have defined an algorithm named DECIDE that has been tested on scientific knowledge bases describing transformation processes.
Streamlining Python Development: A Guide to a Modern Project Setup
Detection of Contextual Identity Links in a Knowledge Base
1. Detection of Contextual Identity Links
in a Knowledge Base
Joe Raad, Nathalie Pernelle, Fatiha Saïs
firstname.lastname@lri.fr
LRI, Paris-Sud University
Orsay, France
2. 15-Dec-17 Detecting Contextual Identity Links 2 / 33
Identity in the Semantic Web
WHY ?
Harry Potter and
the Chamber of
Secrets
Harry Potter et la
Chambre des
Secrets
300
English
Dataset 1 Dataset 2
J.K.
Rowling
author
pages
350
pages
French
J.K.
Rowling
authoridentical
Integrate Information from Different Sources
Enrich “Dataset 2”
Linked Open Data
3. 15-Dec-17 Detecting Contextual Identity Links 3 / 33
Identity in the Semantic Web
300
English
Dataset 1 Dataset 2
J.K.
Rowling
author
pages
350
pages
French
J.K.
Rowling
authorowl:sameAs
Linked Open Data
English
Harry Potter et la
Chambre des
Secrets
300
pages
350
Harry Potter and
the Chamber of
Secrets
pages
French
Unwanted Inferences
Possible Inconsistencies
HOW ?
≈ 558 million owl:sameAs statements (LOD stat 2015)
4. 15-Dec-17 Detecting Contextual Identity Links 4 / 33
Identity in the Semantic Web
SOLUTION ?
Harry Potter and
the Chamber of
Secrets
Harry Potter et la
Chambre des
Secrets
300
English
Dataset 1 Dataset 2
J.K.
Rowling
author
pages
350
pages
French
J.K.
Rowling
author
same
art work
Contextual Identity
Linked Open Data
6. 15-Dec-17 Detecting Contextual Identity Links 6 / 33
Contextual Identity – State of the Art
1. skos:exactMatch : indicates a high degree of confidence that the concepts can be
used interchangeably across a wide range of applications
• Undefined contexts in which this identity holds
(Miles et al., 2009)
• Can only be used to link skos concepts
2. The Similarity Ontology : presents a hierarchy of 13 predicates (8 new)
Each predicate is characterized by the reflexivity, transitivity, and symmetric properties
• Undefined contexts in which this identity holds
(Halpin et al. , 2010)
• Difficult to use due to its subjectivity
7. 15-Dec-17 Detecting Contextual Identity Links 7 / 33
Contextual Identity – State of the Art
3. Domain Specific Identity Links:
volume(lem1, a1) Ʌ volume(lem2, a1) same_lemonade(lem1, lem2)
• Requires Expert’s Intervention
4. Indiscernibility Relation: defines identity relations in a context represented by a
set of properties
The contexts are hierarchized in a lattice
• A context is a set of properties that does not consider the classes’ organization in
the RDF dataset
(Beek et al. , 2016)
• Identity is locally defined (does not propagates in the RDF graph using the object
properties)
8. 15-Dec-17 Detecting Contextual Identity Links 8 / 33
Objectives
• Introduce a new Contextual Identity Link
• Represent a context using the ontology vocabulary
• Propose an approach capable of detecting all the
contexts in which two ontology instances are identical
• Benefit from the experts’ knowledge (if available)
10. 15-Dec-17 Detecting Contextual Identity Links 10 / 33
Contextual Identity
In which Context “drug1” is considered as identical to “drug2”?
11. 15-Dec-17 Detecting Contextual Identity Links 11 / 33
Contextual Identity
1) In a context where we discard the property “name”
and “hasValue”
12. 15-Dec-17 Detecting Contextual Identity Links 12 / 33
Contextual Identity
hasWeight
2) In a context where we don’t consider the property
“name” and the Weight of the Lactose
13. 15-Dec-17 Detecting Contextual Identity Links 13 / 33
Contextual Identity
Given an ontology O = ( C, DP, OP, A ) with
• C = set of classes
• DP = set of owl:DataTypeProperty
• OP = set of owl:ObjectProperty
• A = set of Axioms (e.g. domains and ranges, subsumption)
A Global Context is a sub ontology GCu = ( Cu , DPu , OPu , Au ) with
• Cu ⊆ DepC ⊆ C
• DPu ⊆ DP
• Opu ⊆ OP
• Au = domain and range axioms more specific than those described in A
What is a (Global) Context ?
15. 15-Dec-17 Detecting Contextual Identity Links 15 / 33
Contextual Identity
Order Relation between Global Contexts
GCu = ( Cu , DPu , OPu , Au ) and GCv = ( Cv , DPv , OPv , Av )
GCu ≤ GCv if :
• Cv ⊆ Cu
• DPv ⊆ DPu
• OPv ⊆ Opu
• ∀ op ∈ OPv : domainv(op) ⊑ domainu(op) and rangev(op) ⊑ rangeu(op)
• ∀ dp ∈ DPv : domainv(op) ⊑ domainu(op) and rangev(op) = rangeu(op)
16. 15-Dec-17 Detecting Contextual Identity Links 16 / 33
Contextual Identity
Order Relation between Global Contexts
≤
(more specific than)
17. 15-Dec-17 Detecting Contextual Identity Links 17 / 33
Contextual Identity
Under which conditions two individuals are contextually identical ?
Short Answer
If their contextual descriptions are isomorphic up to a renaming
of the instances URI
What is an instance’s contextual description ?
18. 15-Dec-17 Detecting Contextual Identity Links 18 / 33
Contextual Identity
Contextual Instance Description According to a Global Context
Gdrug1 of drug1 in GC1
19. 15-Dec-17 Detecting Contextual Identity Links 19 / 33
Contextual Identity
Contextual Instance Description According to a Global Context
Gdrug1 of drug1 in GC2
20. 15-Dec-17 Detecting Contextual Identity Links 20 / 33
Identity in a Global Context
Gdrug1 of drug1 in GC1
Gdrug2 of drug2 in GC1
Isomorphic up
to a renaming of
the instance URI
identiConTo<GC1> (drug1, drug2)
21. 15-Dec-17 Detecting Contextual Identity Links 21 / 33
Identity in a Global Context
Global Contexts and Identity Relations are represented in Named Graphs
<#GC1> <#moreSpecificThan> <#GC2>
<#GC1> {
<#isComposedOf> rdfs:domain <#Drug>.
<#isComposedOf> rdfs:range <#Lactose>.
<#isComposedOf> rdfs:range <#Paracetamol>.
…
<#drug1> <#identiConTo> <#drug2>
}
<#GC2> {
<#isComposedOf> rdfs:domain <#Drug>.
<#isComposedOf> rdfs:range <#Lactose>.
}
• <#identiConTo> is only specified for the most specific global context(s)
• More general contexts can be inferred using the order relations between
global contexts
23. 15-Dec-17 Detecting Contextual Identity Links 23 / 33
Detection of Contextual Identity
How can we automatically detect and add these
contextual identity links in a knowledge base ?
DECIDE
DEtection of Contextual IDEntity
Co-occurring
Properties
Unwanted
Properties
Knowledge
Base
Target
Class Necessary
Properties
Unwanted Properties Necessary Properties Co-occurring Properties
p has unstructured values (free
text); insignificant variations
up = (ci , p, *)
p should exist in every context in
order to consider it as relevant
np = (ci , p, *)
p1 does not have a meaning
without p2 (e.g. measure and unit)
cp = { (ci , p1, *), (ci , p2, *)}
24. 15-Dec-17 Detecting Contextual Identity Links 24 / 33
Detection of Contextual Identity
How can we automatically detect and add these
contextual identity links in a knowledge base ?
DECIDE
DEtection of Contextual IDEntity
Co-occurring
Properties
Unwanted
Properties
Knowledge
Base
Target
Class Necessary
Properties
For each pair of individuals (i1, i2) of the target class
set of the most specific global contexts
in which (i1, i2) are identical
25. 15-Dec-17 Detecting Contextual Identity Links 25 / 33
Detection of Contextual Identity
DECIDE
DEtection of Contextual IDEntity
• Step 1: Choosing the Level of Abstraction by selecting the set of Classes
DepC = { Drug, Paracetamol, Lactose, Weight }
26. 15-Dec-17 Detecting Contextual Identity Links 26 / 33
Detection of Contextual Identity
DECIDE
DEtection of Contextual IDEntity
• Step 2: For each pair of the target class, construct the identity graph(s)
• Depth First Construction
• Several Identity Graphs in case of multi valued properties
(different pair mappings)
• Each node contains two local contexts
27. 15-Dec-17 Detecting Contextual Identity Links 27 / 33
Detection of Contextual Identity
DECIDE
DEtection of Contextual IDEntity
• Step 3: For each constructed identity graph, generate the most specific GC
1) In a context where
we discard the
property “name”
and “hasValue”
2) In a context where
we don’t consider
the property
“name” and the
Weight of the
Lactose
31. 15-Dec-17 Detecting Contextual Identity Links 31 / 33
Experiments
Use of Contextual Identity Links for Prediction
We want to detect for each context GCi, the measures mi where
identiConTo<GCi>(i1, i2) ∩ observes(i1, m1) → observes(i2, m2)
with m1 ≃ m2
identiConTo<GCi>(i1, i2) same(mi)
32. 15-Dec-17 Detecting Contextual Identity Links 32 / 33
Experiments
Use of Contextual Identity Links for Prediction
≈ 3700 rules were generated
Domain experts have evaluated the plausibility of the 20 best rules
(in terms of error rate and support combined)
Strongly
Disagree
Disagree Not sure Agree Strongly
Agree
plausibility
1 6 4 90
The error rate of a rule decreases by 22% in CellExtraDry and by 33.5% in
Carredas when a global context is replaced by a more specific global context
33. 15-Dec-17 Detecting Contextual Identity Links 33 / 33
Conclusion
• The use of genuine identity links is rarely required in scientific datasets
• Asking domain experts to specify the contexts in which two objects are
considered identical is not intuitive asking for constraints is easier
• Proposition of a new contextual identity link (identiConTo)
₋ Transitive, Symmetric, Reflexive
₋ Based on the notion of global contexts
• Proposition of an algorithm for detecting the most specific global context(s)
in which a pair of instances of a target class are identical (DECIDE)
• Contextual Identity Links can be used for prediction tasks