1. On Mapping Relational Databases to RDF and SHACL
Ratan Bahadur Thapa
PhD Candidate - SIRIUS & IFI
ratanbt@ifi.uio.no
2. Outline
Mapping of Relational Data to RDF
Direct Mapping
Sequeda et. al.’s Direct Mapping
Constraint rewriting T for Direct Mapping
Properties of Rewriting T
Research Question
R2RML: RDB to RDF Mapping Language
Constraint rewriting Γ for simple RDB to RDF Mapping
Simple RDB to RDF Mapping
Exam. of Rewriting Γ
Properties of Rewriting Γ
Final Remarks
References
2nd April 2023 1 / 21
3. Mapping M of Relational Data to RDF
Standardized by RDB2RDF working group (W3C)
Direct Mapping, i.e., Default and Automatic
R2RML: RDB to RDF Mapping Language
Available Tools
D2R, Virtuoso, Morph, r2rml4net, db2triples, ultrawrap, Quest
Commercial such as Virtuoso, Oracle SW
Properties of M
M is data mapping
M translates database instances into RDF triples
M 1
is monotone
If database instances D ⊆ D′
then M(D) ⊆ M(D′
)
1
R2RML with Monotonic Source Query
2nd April 2023 2 / 21
4. Mapping M of Relational Data to RDF
2nd April 2023 3 / 21
Relational
Database
RDF
W3C Mapping
Quality assurance and validation?
INPUT
Database Schema and Instance D
Key Constraints: PKs and FKs
Other Constraints: Nullability, Uniqueness and Data types
OUTPUT
RDF Graph
Primary descriptors?
Does not explicitly differentiate between data and schema
Constraint-less, i.e., rdf syntax cannot express constraints
Challenges with "RDF without schema and constraint descriptions":
Understandability and usability
Verifying compliance of a dataset w.r.t. certain requirement or
policies
Detecting metadata errors etc
5. Direct Mapping M Engine
M is a Fixed Set of Mapping Rules
Generates IRI identifiers for table names, columns and foreign keys
Generates identifiers for tuples: IRI if PK exists, otherwise Blank nodes
Produce Triples: Table (tuples), Literal (attributes), Reference (FKs)
2nd April 2023 4 / 21
Table triples: for every tuples of tables
<baseIRI/User#U_ID=E01 > rdf:type <baseIRI/User> .
Literal triples: for every attributes of table
<baseIRI/User#U_ID=E01> <baseIRI/User#Name> "Ida" .
Reference triples: for every FK attributes of table (if exists)
U_ID Name Position
E01 Ida Post Doc
User
<baseIRI/User>
IRI for table
<baseIRI/User#Name>
<baseIRI/User#U_ID=E01>
IRI for tuples
IRI for columns
6. Sequeda et. al.2
Direct Mapping M
Extend W3C Direct Mapping M with Binary table rule and OWL axioms
Contain "OWL rules" that translate vocabularies V identifies by direct
mapping rules Ms
into OWL axioms
Mapping rules Mi
translate database instance into RDF
Key constraints
Σ
Schema R
Instance D
Ms
V Mi
OWL rules
Graph G
OWL axioms
2
On Directly Mapping Relational Databases to RDF and OWL, WWW 2012
2nd April 2023 5 / 21
7. Properties of Sequeda et. al.’s Direct
Mapping M
Information and Query Preserving, and Monotone
M is not semantics Preserving
i.e., for every R and σ set of PKs and FKs on R, it is not the case that
D ⊨ σ ⇐⇒ M(D) ⊨ OWL axioms
Non-monotonic Mextended is semantics Preserving, i.e., relies on DB
instances and artificial RDF triples to trigger unsatisfiability of OWL
axioms
No monotone M is semantics preserving, Sequeda et. al.[Them. 3].
2nd April 2023 6 / 21
8. Constraint Rewriting 4
T for Direct
Mapping M
Extend Monotone Direct Mapping M with SHACL 3 Constraints
T translates vocabularies V identifies by mapping rules Ms
and SQL
(keys, not nullable and uniqueness) constraints Σ into sets of SHACL
shapes
Data constraints δ
Key constraints σ
Σ
Schema R
Instance D
Ms
V
Γ
Mi
Shapes S
Graph G
3
Shapes Constraint Language for describing RDF, W3C rec since 2017.
4
A Souce-to-Target Constraint Rewriting for Direct Mapping, ISWC 2021.
2nd April 2023 7 / 21
9. Properties of Rewriting T
T is constraint preserving, i.e., there exist mapping N s.t.
N(T (V, Σ)) = (V, Σ).
T is not semantics Preserving
i.e., for every R and Σ set of SQL constraints on R, it is not the case that
D ⊨ Σ ⇐⇒ M(D) ⊨ T (V, Σ)
2nd April 2023 8 / 21
10. Properties of Rewriting T
T is constraint preserving
T is not semantics Preserving
i.e., for every R and Σ set of SQL constraints on R, it is not the case that
D ⊨ Σ ⇐⇒ M(D) ⊨ T (V, Σ)
Example:
2nd April 2023 9 / 21
U_ID
U01
U01
Null
User
:User/U_ID=U01 rdf:type :User .
:User/U_ID=U01 :User/U_ID "U01" .
RDF Triples:
:User a sh:NodeShape, rdfs:Class;
sh:property [ sh:path :User/U_ID;
sh:nodeKind. sh:Literal;
sh:maxCount 1; sh:minCount 1;
sh:datatype xsd:integer ];
un:uniqueValuesForClass [un:unqProp :User/U_ID;
un:unqForClass :User ].
Class
Datatype Property
11. Properties of Rewriting T
T is constraint preserving
T is not semantics Preserving
i.e., for every R and Σ set of SQL constraints on R, it is not the case that
D ⊨ Σ ⇐⇒ M(D) ⊨ T(V, Σ)
Since M,
Generates RDF terms from the active domain of database,
i.e., ignores the Nulls
Rule that generates IRIs for tuples from the PK values is injective
mapping, i.e., maps duplicate values to a single IRI
T is weakly semantics preserving, i.e.,
D ⊨ Σ ⇐⇒ M(D) ⊨ T(V, Σ),
for all DB instances D that satisfy their key constraints σ.
2nd April 2023 10 / 21
12. Research Question
Constraint rewriting T for monotone Direct mapping M is not semantics
preserving if:
Relational data violating keys constraints are considered
Besides weak semantics translation between SQL constraints and SHACL for
Direct Mapping:
Does there exist any other strong 5
semantics correspondence?
D ⊨ Σ =⇒ M(D) ⊨ T (V, Σ), where T (V, Σ) is maximal ?
"Maximal", meaning that any other SHACL constraints are either not
implied by the source constraints Σ wrt mapping M, or subsumed by
the maximally implied sets T (V, Σ) of SHACL shapes.
What would be the definition of such constraint rewriting T ?
5
one-to-one semantics translation between SQL constaints and SHACL
2nd April 2023 11 / 21
13. R2RML: RDB to RDF Mapping Language
RDB to RDF Mapping M is a Finite Set of Assertion of the Form
”Query −→ Triple Patterns”
Example:
Select S_id from student −→ ⟨iri1(S_id), rdf:type, Student⟩.
Select C_id from course −→ ⟨iri2(C_id), rdf:type, Course⟩.
Select S_id, C_id from −→ ⟨iri1(S_id), enrolledFor, iri2(C_id)⟩.
student, course where
student.Code = course.C_id
create table course (C_id varchar primary key, Title varchar unique);
create table student (S_id integer primary key, Name varchar, Code
varchar not null foreign key references course(C_id));
S_id Name Code
011 Ida CS40
012 CS20
C_id Title
CS40 Logic
CS20 Database
CS50 Data Eng
FK
2nd April 2023 12 / 21
14. Constraint Rewriting 6
T for Simple
RDB to RDF Mapping M 7
A Maximal Semantics Preserving Rewriting T for Simple Mapping M
T : Q −→ P(S),
where
Q is a set of all pairs (M, Σ) s.t.,
M is a Simple RDB-to-RDF Mapping
Σ is a set of SQL constraints, i.e., keys and others
S is a set of all SHACL shapes
P(S) is Maximal sets of SHACL shapes
6
Mapping Relational Database Constraint to SHACL, ISWC 2022.
7
Simplifying M further yields Direct Mapping, therefore, results of T also apply for Direct
Mapping
2nd April 2023 13 / 21
15. Simple RDB to RDF Mapping M
A simple mapping M is a finite set of assertions of form Q −→ ψ,
where
Q is an SP or SPJ query over a relational source D, called source
query, s.t.,
Selections considered are those that filter out nulls
Joins considered are equality joins along foreign keys.
ψ is a graph triple pattern
Example:
πS_idσ¬isNull(S_id)(student) −→ ⟨iri1(S_id), rdf:type, Student⟩.
πC_idσ¬isNull(C_id)(course) −→ ⟨iri2(C_id), rdf:type, Course⟩.
πS_id,C_idσ¬isNull(S_id)∧¬isNull(C_id) −→ ⟨iri1(S_id), enrolledFor, iri2(C_id)⟩.
(Q1 ⋊
⋉Code=C_id Q2)
where Q1 = σ¬isNull(S_id)∧¬isNull(Code)(student) and Q2 = σ¬isNull(C_id)(course).
2nd April 2023 14 / 21
16. The Rewriting Γ
Rewriting steps:
Let Q −→ ψ be a mapping defined on schema R with Σ.
Then, Γ computes,
1 Σ|Q - i.e., Σ propagated to the att(Q)
2 Σ|Q ⊩ σX→Y - where X, Y ⊆ att(Q), i.e., Σ-implied data
dependency σ 8
on view projected by Q
3 SHACL constraint on scheme(ψ) based on Σ|Q ⊩ σ and
mappings
8
Data dependencies that also apply to the databases with null
2nd April 2023 15 / 21
Rewriting Γ
Mapping M
Constraints Σ
Instance D
Database
(R,Σ,D)
RDF Graph M(D)
SHACL Constraint
Γ(M,Σ)
17. Exam. of Γ: Inputs R, Σ and M
Schema R with Σ Defn.
create table course (C_id varchar primary key, Title varchar unique);
create table student (S_id integer primary key, Name varchar, Code
varchar not null foreign key references course(C_id));
S_id Name Code
011 Ida CS40
012 CS20
C_id Title
CS40 Logic
CS20 Database
CS50 Data Eng
FK
Simple Mapping M Defn.
πS_idσ¬isNull(S_id)(student) −→ ⟨iri1(S_id), rdf : type, Student⟩.
πC_idσ¬isNull(C_id)(course) −→ ⟨iri2(C_id), rdf : type, Course⟩.
πS_id,C_idσ¬isNull(S_id)∧¬isNull(C_id) −→ ⟨iri1(S_id), enrolledFor,
(Q1 ⋊
⋉Code=C_id Q2) iri2(C_id)⟩.
where Q1 = σ¬isNull(S_id)∧¬isNull(Code)(student) and Q2 = σ¬isNull(C_id)(course)
2nd April 2023 16 / 21
18. Exam. of Γ: Computing att(Q), Σ|Q and
ΣQ ⊩ σ
πS_id,C_idσ¬isNull(S_id)∧¬isNull(C_id)(Q1 ⋊
⋉Code=C_id Q2)
−→ ⟨iri1(S_id), enrolledFor, iri2(C_id)⟩.
where Q1 = σ¬isNull(S_id)∧¬isNull(Code)(student) and Q2 = σ¬isNull(C_id)(course)
Steps 1-2 of Γ:
att(Q1) = {S_id,Code} and {UNQ(S_id), NN(S_id), NN(Code)} ⊆ Σ|Q,
ΣQ1
⊩ FDS_id→Code
att(Q2) = {C_id} and {UNQ(C_id), NN(C_id)} ⊆ Σ|Q2
,
ΣQ2
⊩ UFDC_id→C_id
att(Q) = {S_id, C_id} and
FK(Code, student, C_id, course) ∈ Σ|Q1
∩ Σ|Q2
,
ΣQ ⊩ FDS_id→C_id
since ΣQ1
⊩ FDS_id→Code, and Σ ⊩ UFDC_id→C_id → Σ ⊩ FDC_id→C_id.
2nd April 2023 17 / 21
19. Exam. of Γ: Computing Γ(M, Σ)
πS_idσ¬isNull(S_id)(student) −→ ⟨iri1(S_id), rdf : type, Student⟩.
πC_idσ¬isNull(C_id)(course) −→ ⟨iri2(C_id), rdf : type, Course⟩.
πS_id,C_idσ¬isNull(S_id)∧¬isNull(C_id) −→ ⟨iri1(S_id), enrolledFor,
(Q1 ⋊
⋉Code=C_id Q2) iri2(C_id)⟩.
Steps 1-2 of Γ: Defn. Σ|Q ⊩ σ
ΣQ1
⊩ FDS_id→Code, ΣQ2
⊩ UFDC_id→C_id, ΣQ ⊩ FDS_id→C_id
Step 3 of Γ: Defn. shapes Γ(M, Σ) with implicit class-based target
⟨Student, τStudent, φStudent⟩ s.t., since mStudent ∈ M
(≥0 enrolledFor. Course) ∈ φStudent since M, ι(m, M) = A
(≤0 enrolledFor. ¬Course) ∈ φStudent since M is simple
(=1 enrolledFor. Course) ∈ φStudent since ΣQ ⊩ FDS_id→C_id
⟨Course, τCourse, φCourse⟩ s.t., since mCourse ∈ M
(≥0 enrolledFor−
. Student) ∈ φCourse since M, ι(m, M) = A
(≤0 enrolledFor−
. ¬Student) ∈ φCourse since M is simple
2nd April 2023 18 / 21
20. Properties of Γ
Γ is semantics preserving,
i.e.,
for a mapping set M defined over a relational schema R with source
constraint Σ, and an arbitrary instance D of R:
D ⊨ Σ =⇒ M(D) ⊨ Γ(M, Σ).
Γ is maximal semantics preserving,
i.e.,
∀S. Σ |=M S s.t. sch(S) ⊆ sch(M), meaning that
∀D. (D ⊨ Σ =⇒ M(D) ⊨ S),
∀G.(G |= Γ(M, Σ) =⇒ G |= S).
Γ is monotone,
i.e.,
for every two mapping sets M1 ⊆ M2 defined on a schema R with Σ,
∀G.(G |= Γ(M2, Σ) =⇒ G |= Γ(M1, Σ))
2nd April 2023 19 / 21
21. Final Remarks
Constraint Rewriting Γ Extends RDB to RDF Mapping M with SHACL, where
M is simple RDB to RDF mapping.
Γ is maximal semantics preserving and monotone.
Work in Progress and Future Goal :
Extension of Γ beyond simple R2RML, i.e.,Expressive R2RML with
monotonic source query
Extension of Γ in an OBDA Platform
SPARQL query simplification and optimization with Γ
Semantics preserving SQL-to-SPARQL translation with Γ
2nd April 2023 20 / 21
22. References
Ratan Bahadur Thapa and Martin Giese
A source-to-target constraint rewriting for direct mapping.
International Semantic Web Conference, 21–38, 2021, Springer.
Ratan Bahadur Thapa and Martin Giese
Mapping Relational Database Constraints to SHACL.
International Semantic Web Conference, 214–230, 2022, Springer.
Juan F. Sequeda, Marcelo Arenas and Daniel P. Miranker
On directly mapping relational databases to RDF and OWL.
Proc. 21st Intl. Conf. on World Wide Web, 649-658, 2012, ACM.
Marcelo Arenas, Alexandre Bertails, Eric Prud’hommeaux and Juan F. Sequeda
A Direct Mapping of Relational Data to RDF.
W3C Recommendation, 2012
Souripriya Das, Seema Sundara and Richard Cyganiak
R2RML: RDB to RDF Mapping Language.
W3C Recommendation, 2012
Holger Knublauch and Dimitris Kontokostas
Shapes Constraint Language (SHACL).
W3C Recommendation, 2017
2nd April 2023 21 / 21