1. Learning Commonalities in
SPARQL
Sara El Hassad François Goasdoué Hélène Jaudoin
IRISA, Univ. Rennes 1, Lannion, France
ISWC 2017 - 21 - 26 October 2017
1/31
2. Introduction
Least general generalization (lgg)
Machine Learning in the early 70’s by Gordon Plotkin
Knowledge representation domain in the early 90’s
Recently in semantic web
2/31
3. Introduction
Least general generalization (lgg)
Machine Learning in the early 70’s by Gordon Plotkin
Knowledge representation domain in the early 90’s
Recently in semantic web
Applications of lgg
Query optimization: identify candidate views, or potiential query
sharing
Query approximation: a set of queries by a single query
Social context: recommending users asking for enough relates things
2/31
4. Introduction
Least general generalization (lgg)
Machine Learning in the early 70’s by Gordon Plotkin
Knowledge representation domain in the early 90’s
Recently in semantic web
Applications of lgg
Query optimization: identify candidate views, or potiential query
sharing
Query approximation: a set of queries by a single query
Social context: recommending users asking for enough relates things
Goal
To study the problem in the entire conjunctive fragment of SPARQL
setting.
2/31
6. RDF graphs
Specification of RDF graphs with triples:
(s, p, o) ∈ (U ∪ B) × U × (U ∪ L ∪ B) s op
Built-in property URIs to state RDF statements
RDF statement Triple
Class assertion (s, rdf:type, o)
Property assertion (s, p, o) with
p = rdf:type
b "LGG in RDF"
ConfPaper b1
hasTitle
τ hasContactAuthor
4/31
7. Adding ontological knowledge to RDF graphs
Built-in property URIs to state RDF Schema statements, i.e.,
ontological constraints.
RDFS statement Triple
Subclass (s, sc, o)
Subproperty (s, sp, o)
Domain typing (s, ←d , o)
Range typing (s, →r , o)
b "LGG in RDF"
ConfPaper hasContactAuthor b1
Publication hasAuthor Researcher
hasTitle
τ
sc sp
→r←d
hasContactAuthor
5/31
8. Deriving the implicit triples
b "LGG in RDF"
ConfPaper hasContactAuthor b1
Publication hasAuthor Researcher
hasTitle
τ
sc sp
→r←d
hasContactAuthorτ
hasAuthor
τ→r←d
Figure: RDF graph G
How to derive implicit triples of an RDF graph ?
6/31
10. Semantics of RDF graphs
b "LGG in RDF"
ConfPaper hasContactAuthor b1
Publication hasAuthor Researcher
hasTitle
τ
sc sp
→r←d
hasContactAuthorτ
hasAuthor
τ→r←d
Figure: Saturated RDF graph G∞
8/31
11. Basic graph pattern queries (BGPQ)
BGPQ : conjunctive fragment of SPARQL queries, is the counterpart
of the select-project-join queries for databases
(s, p, o) ∈ (V ∪ U) × (V ∪ U) × (V ∪ U ∪ L)
9/31
12. Basic graph pattern queries (BGPQ)
BGPQ : conjunctive fragment of SPARQL queries, is the counterpart
of the select-project-join queries for databases
(s, p, o) ∈ (V ∪ U) × (V ∪ U) × (V ∪ U ∪ L)
x1 ConfPaper
y1
τ
hasContactAuthor
body(q1)
Figure: Sample BGPQ q1(x1)
9/31
13. Entailing and answering queries
Query entailment
G |=R q ⇐⇒ G∞
|=R q
x1
x2
τ
b "LGG in RDF"
ConfPaper hasContactAuthor b1
Publication hasAuthor
G∞q(x1, x2)
Researcher
hasTitle
τ
sc sp
→r←d
hasContactAuthorτ
hasAuthor
τ→r←d
14. Entailing and answering queries
Query entailment
G |=R q ⇐⇒ G∞
|=R q
x1
x2
τ
b "LGG in RDF"
ConfPaper hasContactAuthor b1
Publication hasAuthor
G∞q(x1, x2)
Researcher
hasTitle
τ
sc sp
→r←d
hasContactAuthorτ
hasAuthor
τ→r←d
b
Publication
τ
10/31
15. Entailing and answering queries
Query answering
x1
x2
τ
b "LGG in RDF"
ConfPaper hasContactAuthor b1
Publication hasAuthor
G∞q(x1, x2)
Researcher
hasTitle
τ
sc sp
→r←d
hasContactAuthorτ
hasAuthor
τ→r←d
16. Entailing and answering queries
Query answering
x1
x2
τ
b "LGG in RDF"
ConfPaper hasContactAuthor b1
Publication hasAuthor
G∞q(x1, x2)
Researcher
hasTitle
τ
sc sp
→r←d
hasContactAuthorτ
hasAuthor
τ→r←d
b
Publication
τ
ConfPaper
τ
Researcher
τ
b1
11/31
17. Entailing between BGPQs
q |=R q ⇐⇒ q∞
|= q
x1 Publication
y1
τ
hasAuthor
SA
z1
hasContactAuthor
title
x2 Publication
y2
τ
hasAuthor
q∞
(x1) q (x2)
20. Towards defining lgg in SPARQL conjunctive fragment
A least general generalization (lgg) of n descriptions d1, . . . , dn is a most
specific description d generalizing every d1≤i≤n for some
generalization/specialization relation between descriptions (G.Plotkin).
lgg in our SPARQL setting
descriptions are BGP Queries
relation generalization/specialization is entailment between queries
14/31
21. Defining the lgg of queries
lgg of BGPQs
Let q1, . . . , qn be BGPQs with the same arity and R a set of RDF
entailment rules.
A generalization of q1, . . . , qn is a BGPQ qg such that qi |=R qg for
1 ≤ i ≤ n.
A least general generalization of q1, . . . , qn is a generalization qlgg of
q1, . . . , qn such that for any other generalization qg of q1, . . . , qn:
qlgg |=R qg .
15/31
22. Defining the lgg of queries
lgg of BGPQs
Let q1, . . . , qn be BGPQs with the same arity and R a set of RDF
entailment rules.
A generalization of q1, . . . , qn is a BGPQ qg such that qi |=R qg for
1 ≤ i ≤ n.
A least general generalization of q1, . . . , qn is a generalization qlgg of
q1, . . . , qn such that for any other generalization qg of q1, . . . , qn:
qlgg |=R qg .
x1 ConfPaper
y1
τ
hasContactAuthor
x2 JourPaper
y2
τ
hasAuthor
bx1x2
bCPJP
τ
q1(x1) q2(x2) qlgg (bx1x2
)
23. Defining the lgg of queries
lgg of BGPQs
Let q1, . . . , qn be BGPQs with the same arity and R a set of RDF
entailment rules.
A generalization of q1, . . . , qn is a BGPQ qg such that qi |=R qg for
1 ≤ i ≤ n.
A least general generalization of q1, . . . , qn is a generalization qlgg of
q1, . . . , qn such that for any other generalization qg of q1, . . . , qn:
qlgg |=R qg .
x1 ConfPaper
y1
τ
hasContactAuthor
x2 JourPaper
y2
τ
hasAuthor
bx1x2
bCPJP
τ
bx1x2
Publication
by1y2
Researcher
τ
hasAuthor
τ
q1(x1) q2(x2) qlgg (bx1x2
) qlggO(bx1x2
)
15/31
24. Entailment relation between BGPQs w.r.t. background
knowledge
Entailment between BGPQs w.r.t. R, O
Given a set R of RDF entailment rules, a set O of RDFS statements, and
two BGPQs q1 and q2 with the same arity, q1 entails q2 w.r.t. O,
denoted q1 |=R,O q2, iff q1
∞
O |= q2 holds.
Well-founded relation : q1 |=R,O q2
Query entailment: if G |=R q1 holds then G |=R q2 holds,
Query answering: q1(G) ⊆ q2(G) holds.
16/31
26. Defining the lgg of queries w.r.t. background knowledge
Definition (lgg of BGPQs w.r.t. RDFS constraints)
Let R be a set of RDF entailment rules, O a set of RDFS statements,
and q1, . . . , qn n BGPQs with the same arity.
A generalization of q1, . . . , qn w.r.t. O is a BGPQ qg such that
qi |=R,Oqg for 1 ≤ i ≤ n.
A least general generalization of q1, . . . , qn w.r.t. O is a
generalization qlgg of q1, . . . , qn w.r.t. O such that for any other
generalization qg of q1, . . . , qn w.r.t. O: qlgg|=R,Oqg .
Theorem
An lgg of BGPQs w.r.t. RDFS statements may not exist for some set of
RDF entailment rules; when it exists, it is unique up to entailment
(|=R,O).
18/31
27. Defining the lgg of queries w.r.t. background knowledge
Definition (lgg of BGPQs w.r.t. RDFS constraints)
Let R be a set of RDF entailment rules, O a set of RDFS statements,
and q1, . . . , qn n BGPQs with the same arity.
A generalization of q1, . . . , qn w.r.t. O is a BGPQ qg such that
qi |=R,Oqg for 1 ≤ i ≤ n.
A least general generalization of q1, . . . , qn w.r.t. O is a
generalization qlgg of q1, . . . , qn w.r.t. O such that for any other
generalization qg of q1, . . . , qn w.r.t. O: qlgg|=R,Oqg .
Result : lgg of n BGPQ queries vs lgg of two BGPQ queries
3(q1, q2, q3) ≡R,O 2( 2(q1, q2), q3)
· · · · · ·
n(q1, . . . , qn) ≡R,O 2( n−1(q1, . . . , qn−1), qn)
≡R,O 2( 2(· · · 2( 2(q1, q2), q3) · · · , qn−1), qn)
19/31
28. Defining the lgg of queries w.r.t. background knowledge
Definition (lgg of BGPQs w.r.t. RDFS constraints)
Let R be a set of RDF entailment rules, O a set of RDFS statements,
and q1, . . . , qn n BGPQs with the same arity.
A generalization of q1, . . . , qn w.r.t. O is a BGPQ qg such that
qi |=R,Oqg for 1 ≤ i ≤ n.
A least general generalization of q1, . . . , qn w.r.t. O is a
generalization qlgg of q1, . . . , qn w.r.t. O such that for any other
generalization qg of q1, . . . , qn w.r.t. O: qlgg|=R,Oqg .
Result : lgg of n BGPQ queries vs lgg of two BGPQ queries
3(q1, q2, q3) ≡R,O 2( 2(q1, q2), q3)
· · · · · ·
n(q1, . . . , qn) ≡R,O 2( n−1(q1, . . . , qn−1), qn)
≡R,O 2( 2(· · · 2( 2(q1, q2), q3) · · · , qn−1), qn)
We focus on computing lgg of two BGPQ queries
19/31
29. Defining the lgg of queries
x1 ConfPaper
y1
τ
hasContactAuthor
x2 JourPaper
y2
τ
hasAuthor
Publication hasAuthor Researcher
ConfPaper JourPaper
hasContactAuthor
←d →r
sp←d →r
scsc
q1(x1) q2(x2) O
31. Defining the lgg of queries
x1 ConfPaper
y1
τ
hasContactAuthor
x2 JourPaper
y2
τ
hasAuthor
Publication hasAuthor Researcher
ConfPaper JourPaper
hasContactAuthor
←d →r
sp←d →r
scsc
q1(x1) q2(x2) O
bx1x2
Publication
by1y2
Researcher
τ
hasAuthor
τ
qlggO
How to compute this query ?
20/31
32. The cover of SPARQL queries
Definition (Cover query)
Let q1, q2 be two BGPQs with the same arity n.
If there exists the BGPQ q such that
head(q1) = q(x1
1 , . . . , xn
1 ) and head(q2) = q(x1
2 , . . . , xn
2 ) iff
head(q) = q(vx1
1 x1
2
, . . . , vxn
1 xn
2
)
(t1, t2, t3) ∈ body(q1) and (t4, t5, t6) ∈ body(q2) iff
(t7, t8, t9) ∈ body(q) with, for 1 ≤ i ≤ 3, ti+6 = ti if ti = ti+3 and
ti ∈ U ∪ L, otherwise ti+6 is the variable vti ti+3
then q is the cover query of q1, q2.
21/31
37. Cover graph vs lgg
Theorem
Given a set R of RDF entailment rules, a set O of RDFS statements and
two BGPQs q1, q2 with the same arity,
1. the cover query q of q1
∞
O , q2
∞
O exists iff an lgg of q1, q2 w.r.t. O
exists;
2. the cover query q of q1
∞
O , q2
∞
O is an lgg of q1, q2 w.r.t. O.
Corollary
A cover query-based lgg of two BGPQs q1 and q2 is computed in
O(|body(q1
∞
O )| × |body(q2
∞
O )|) and its size is
|body(q1
∞
O )| × |body(q2
∞
O )|.
23/31
41. Related work
Structural approaches
RDF
Rooted graphs, ignore RDF entailment :
- [Colucci et al., 2016].
SPARQL : tree queries
- [Lehmann and Bühmann, 2011].
Description Logics
- [Zarrieß and Turhan, 2013].
- [Baader et al., 1999].
Approaches independent of the structure
RDF
- [Hassad et al., 2017].
- [Petrova et al., 2017].
Conceptual Graphs
- [Chein and Mugnier, 2009].
First Order Clauses
- [Nienhuys-Cheng and de Wolf, 1996].
- [Plotkin, 1970].
27/31
42. Conclusion
We revisited the problem of computing a least general generalization
of general BGPQs w.r.t. background knowledge.
We defined new entailment relationship between BGPQs
w.r.t. background knowledge.
We studied the added-value of considering background knowledge
when learning lggs.
Perspective:
Heuristics in order to compute lgg without redundants triples.
28/31
44. References I
[Baader et al., 1999] Baader, F., Kiisters, R., and Molitor, R. (1999).
Computing least common subsumers in description logics with existential restrictions.
In IJCAI.
[Chein and Mugnier, 2009] Chein, M. and Mugnier, M. (2009).
Graph-based Knowledge Representation - Computational Foundations of Conceptual Graphs.
Springer.
[Colucci et al., 2016] Colucci, S., Donini, F., Giannini, S., and Sciascio, E. D. (2016).
Defining and computing least common subsumers in RDF.
J. Web Semantics, 39(0).
[Hassad et al., 2017] Hassad, S. E., Goasdoué, F., and Jaudoin, H. (2017).
Learning commonalities in RDF.
In The 14th Extended Semantic Web Conference, ESWC 2017, Portorož, Slovenia, May 28 - June 1, 2017,
Proceedings, Part I, pages 502–517.
[Lehmann and Bühmann, 2011] Lehmann, J. and Bühmann, L. (2011).
Autosparql: Let users query your knowledge base.
In ESWC.
[Nienhuys-Cheng and de Wolf, 1996] Nienhuys-Cheng, S. and de Wolf, R. (1996).
Least generalizations and greatest specializations of sets of clauses.
J. Artif. Intell. Res.
[Petrova et al., 2017] Petrova, A., Sherkhonov, E., Grau, B. C., and Horrocks, I. (2017).
Entity comparison in RDF graphs.
In International Semantic Web Conference (ISWC). Springer.
[Plotkin, 1970] Plotkin, G. D. (1970).
A note on inductive generalization.
Machine Intelligence, 5.
[W3C-RDFS, 2014] W3C-RDFS (2014).
RDF 1.1 semantics.
https://www.w3.org/TR/rdf11-mt/.
30/31
45. References II
[Zarrieß and Turhan, 2013] Zarrieß, B. and Turhan, A. (2013).
Most specific generalizations w.r.t. general EL-TBoxes.
In IJCAI.
31/31