This document discusses relational databases, RDF graphs, and constraints. It covers:
- Relational databases and their use of constraints like primary keys
- RDF graphs and their lack of explicit schema/constraints
- Mappings from relational databases to RDF graphs using direct mapping and R2RML
- Approaches to rewrite database constraints to SHACL constraints to validate the mapped RDF graph
- Opportunities to optimize SPARQL queries using inferred constraints from the SHACL shapes
1. Relational Databases, RDF Graphs and
Constraints
Ratan Bahadur Thapa
PhD candidate at IFI/SIRIUS
University of Oslo
April 2, 2023
2. Plan
▶ Relational Databases
▶ RDF graph
▶ Relational-to-RDF Mappings
▶ Direct Mapping
▶ Constraint Rewriting for Direct Mapping R2RML
▶ Constraint Rewriting for Direct Mapping
▶ SPARQL Query Optimization With SHACL
▶ Open Questions?
3. Relational Database
▶ E. F. Codd, ”A Relational Model of Data for Large Shared
Data Banks”, IBM, 1970.
▶ First commercial implementation of SQL, Oracle V2, June
1979 (Standardized in 1986 as SQL-86).
▶ Closed World Assumption, i.e., assumption that what is not
known to be true must be false.
▶ E.g., consider a relation ∀x.PhD(x) → Employee(x) in
relational model.
▶ In SQL DDL (i.e., Schema + Constraints)
create table Employee (E id int not null, primary key (E id));
create table PhD (P id int not null, primary key (P id),
Foreign key(P id) ref. to Employee (E id));
4. RDF Graph
▶ Composed of triples ”(Subject, Predicate, Object)”, e.g.,
dbp:Norway dbp-ont:Capital dbp:Oslo .
dbp:Oslo dbp-ont:population ”673469”8sd:integer .
▶ Syntax doesn’t explicitly differentiate betwn. data and Schema
▶ Syntax cannot express constraints
▶ SHACL– a language for validating RDF graphs against a set
of conditions.
▶ makes Closed-World Assumption
▶ ”Schema + Constraint” language for RDF
▶ W3C Recommendation since 2017
5. RDF Graph
▶ Composed of triple ”(Subject, Predicate, Object)” statements
▶ ....
▶ SHACL – a language for validating RDF graphs against a set
of conditions.
▶ ....
▶ E.g. shape :=(Name, target Defn, Constraint Defn)
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
6. Relational-to-RDF Mapping: Direct Mapping and R2RML
Relational
Database
RDF
W3C Mapping
Quality assurance and validation?
INPUT
Database Schema and Instance D
Key Constraints: PKs and FKs
Other Constraints: Nullability, Uniqueness and Data types
OUTPUT
RDF Graph
Primary descriptors?
Does not explicitly differentiate between data and schema
Constraint-less, i.e., rdf syntax cannot express constraints
Challenges with ”RDB2RDF Graph”:
▶ Understandability and usability
▶ Verifying compliance of a dataset w.r.t. certain requirement
or policies
▶ Detecting metadata errors, Query optimizations etc
7. Sequeda et. al.3
Direct Mapping M
Integrates W3C Direct Mapping M1 and SQL Schemas to OWL
mapping2
1
Marcelo et. al, A Direct Mapping of Relational Data to RDF, W3C rec. 2012
2
Tirmizi et. al., Translating SQL applications to the Semantic Web, DEXA 2008
3
On Directly Mapping Relational Databases to RDF and OWL, WWW 2012
8. Properties of Sequeda et. al.’s Direct Mapping M
▶ M is not Semantics Preserving
i.e., for every R and σ set of PKs and FKs on R, it is not the case
that
D ⊨ σ ⇐⇒ M(D) ⊨ OWL axioms
E.g.,
E_ID Name Position
E01 Ida Post Doc
E01 Cathrine PhD
Employee
:E01
"Ida"
"Cathrine"
"Post Doc"
"PhD"
:baseIRI/Employee#Name
:baseIRI/Employee#Position
rdf:type
:baseIRI/Employee
▶ No monotone M is semantics preserving, Sequeda et. al.[Them. 3].
9. Constraint Rewriting 4
Γ for Direct Mapping M
Extend Direct Mapping M with rules Γ that rewrite SQL Schema
and Constraints into SHACL
Data constraints δ
Key constraints σ
Σ
Schema R
Instance D
Ms
V
Γ
Mi
Shapes S
Graph G
4
A Souce-to-Target Constraint Rewriting for Direct Mapping, ISWC 2021.
10. Properties of Rewriting Γ
▶ Γ is weakly semantics preserving, i.e.,
D ⊨ Σ ⇐⇒ M(D) ⊨ Γ(V, Σ),
for all DB instances D that satisfy their key constraints σ.
11. Question?
Besides weak semantics correspondence between SQL constraints
and SHACL:
D ⊨ Σ =⇒ M(D) ⊨ Γ(V, Σ),
where Γ(V, Σ) is maximal ? definition of such constraint rewriting
Γ?
12. Constraint Rewriting 5
Γ for simple R2RML M 6
A simple mapping M is a finite set of assertions of the form
Q −→ ψ,
where
▶ Q is an SP or SPJ query over a relational source D s.t.,
▶ filter out nulls
▶ equality joins along foreign keys.
▶ ψ is a triple pattern
5
Mapping Relational Database Constraint to SHACL, ISWC 2022.
6
Souripriya et. al., R2RML: RDB to RDF Mapping Language, W3C rec. 2012
13. Constraint Rewriting Γ for simple R2RML M
Rewriting Γ
Mapping M
Constraints Σ
Instance D
Database
(R,Σ,D)
RDF Graph M(D)
SHACL Constraint
Γ(M,Σ)
Rewriting steps:
▶ Let Q −→ ψ be a mapping defined on schema R with Σ.
Then, Γ computes,
1. Σ|Q - i.e., Σ propagated to the att(Q)
2. Σ|Q ⊩ σX→Y - where X, Y ⊆ att(Q), i.e., Σ-implied data
dependency σ 7
on view projected by Q
3. SHACL constraint on scheme(ψ) based on Σ|Q ⊩ σ and
mappings
7
Data dependencies that also apply to the databases with null
14. Properties of Γ
▶ Γ is maximal semantics preserving,
i.e.,
∀S. Σ |=M S s.t. sch(S) ⊆ sch(M), meaning that
∀D. (D ⊨ Σ =⇒ M(D) ⊨ S),
∀G.(G |= Γ(M, Σ) =⇒ G |= S).
16. SPARQL query optimization with SHACL 8
In short, we aim to find optimal S-equivalent queries Q’ of
the original query Q s.t.,
Q′
≡S Q iff ∀G.G |= S =⇒ Q′G
= QG
We propose a set of query rewriting rules that based on SHACL
guarantee,
1. reduce OPTIONAL to JOIN Pattern
2. remove JOIN Pattern
3. eliminate DIST Operator etc
8
Manuscript, 2023
17. Example
Consider an RDF graph on the left that validates a SHACL shape s
on the right, written in Turtle syntax:
:Ida a :Employee;
:hasID "001"^^xsd:int;
:hasAddress "Oslo".
:Ingrid a :Employee;
:hasID "002"^^xsd:int;
:hasAddress "Bergen".
:EmployeeNode a sh:NodeShape;
sh:targetClass :Employee;
sh:property [ sh:path :hasAddress;
sh:nodeKind sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:string ];
sh:property [ sh:path :hasAddress;
dash:uniqueValueForClass
:Employee ].
18. An Example of Query Rewriting
Assume a SPARQL query,
Dist(Projxy (Opt⊤(Employee(x), hasAddress(x, y)))).
over graph G that satisfies shape s. Then,
▶ Since G |= ∀x.Employee(x) → ∃yhasAddress(x, y) (resp.,
G |= ∀x∀y∀y′.hasAddress(x, y)∧hasAddress(x, y′) → y = y′),
Dist(Projxy (Join(Employee(x), hasAddress(x, y)))).
▶ Since G |= ∀x.Employee(x) → ∃yhasAddress(x, y) and
G |= ∀x∀x′∀y.hasAddress(x, y) ∧ hasAddress(x′, y) → x = x′)
Projxy (Join(Employee(x), hasAddress(x, y))).
19. Property of Query Rewriting Rules: Confluent Reduction
SPARQL query is a graph pattern P defined by the grammar
P := B | FilterF (P) | Union(P1, P2) | Join(P1, P2) | Minus(P1, P2)
| DiffF (P1, P2) | OptF (P1, P2) | ProjL(P) | Dist(P)
Shape target τs and constraint ϕs are expressions defined by the
grammar
τs := sh:targetClass C | sh:targetSubjectOf P |
sh:targetObjectOf P
ϕs := ≥n α. β | ≤n α. β | ▷τs α | α1 = α2 | ϕs ∧ ϕs
β := ⊤ | C | s′
| ¬β
20. Future Work?
▶ Constraints Rewriting for Expressive Ontology-Based (or
BootStrap-Based) Mapping Patterns.
▶ Optimization of SPARQL Path Query
▶ Optimization of Ontology-Mediated Query Rewriting