ABSTRACT: Knowledge Graphs (KGs) are an emerging, highly flexible and Web-friendly technology for integrating, representing, and querying semi-structured data in a semantically rich model formalized by an Ontology. KGs may be built using specialized data management software (e.g., triplestores) or, by leveraging suitable mappings and query rewriting techniques, as "Virtual Knowledge Graph" (VKG) views over some legacy data source, such as a relational database. In this talk, we provide background information on VKGs and their underlying technologies, with particular emphasis on the open-source Ontop VKG engine, and we discuss ongoing research and development efforts towards their extension to Web APIs as a non-relational data source of practical relevance. This extension, supported by the HIVE and OntoCRM projects, would also enable transparent access to both static relational data and dynamically-computed Web API data as part of a regular VKG query.
BIO: Francesco Corcoglioniti is a researcher at the Free University of Bozen-Bolzano, Italy, where he contributes to research, development, and project collaborations related to Virtual Knowledge Graphs (VKG), their extensions, and their implementation in the open-source Ontop system.
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
Towards Virtual Knowledge Graphs over Web APIs
1. Towards Virtual Knowledge Graphs over Web APIs
Francesco Corcoglioniti
2022-11-09
postdoc @ KRDB, Free University of Bolzano,
supported by HIVE Fusion Grant project (2021-2022), OntoCRM project (2022-2024), and Ontopic s.r.l
slides available online at https://bit.ly/3WOoldB
2. 1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
3. Big Data Context
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 1/34
4. Variety Drives Data Management Initiatives
69%
25%
6%
Relative Importance
Variety
Volume
Velocity
http://sloanreview.mit.edu/article/
variety-not-volume-is-driving-big-data-initiatives/
(2016)
Data model heterogeneity
relational data, graph data, XML, JSON, CSV,
text files, ...
System heterogeneity
even when systems adopt the same data
model, they are not always fully compatible
Schema heterogeneity
different people see things differently, and
design schemas differently
Data-level heterogeneity
e.g., ‘IBM’ vs. ‘Int. Business Machines’ vs.
‘International Business Machines’
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 2/34
5. Querying Data Takes Time and IT Expertise (besides Domain Knowledge)
Query from Statoil (now Equinor) use case
EU FP7 Optique project
Natural language: In a given area, return
pressure data tagged with stratigraphy and
quality control attributes
SQL: huge query joining 9 tables, the main one
with 38 columns with cryptic names
Query from Sloan Digital Sky Survey use case
EU H2020 INODE project
Natural language: Get all white dwarf stars
SQL: unintelligible query defining ‘white dwarf’
SELECT objID
FROM skyserverv3_correct.star
WHERE u - g < .4 AND g - r < .7 AND
r - i > .4 AND i - z > .4
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 3/34
6. Virtual Knowledge Graphs (VKG) – a Data Access / Integration Solution
Three key ideas:
1. use a global (or integrated) schema and map the data sources to the global schema
2. adopt a very flexible data model for the global schema
→ Knowledge Graph (KG) whose vocabulary is expressed in an ontology.
3. exploit virtualization, i.e., the KG is not materialized, but kept virtual
This gives rise to the Virtual Knowledge Graph (VKG) approach to data access / integration, also
called Ontology-Based Data Access / Integration (OBDA)
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 4/34
7. Virtual Knowledge graphs (VKG) – Core Components
Ontology
conceptualizes a domain of interest in terms of
classes and (binary) properties, overall defining
the terminological knowledge (TBox) of the VKG
Data sources
provide the data forming the RDF triples, i.e., the
assertional knowledge (ABox), of the VKG
Mapping
define how to generate the RDF triples from the raw
data (e.g., relational), via mapping assertions that
populate each class/property of the ontology
Queries
formulated against the VKG (which is virtual) and
rewritten in native queries evaluated over the sources
. . .
. . .
. . .
. . .
Ontology O
Mapping M
Data sources D
query
results
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 5/34
8. 1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
9. VKG Framework – Which Languages to Use?
Need to balance
• expressive power
of adopted languages for O, M, q
• query answering efficiency
with respect to data size
. . .
. . .
. . .
. . .
Ontology O
Mapping M
Data sources D
query
results
W3C has standardized languages that are suitable for VKGs:
• Knowledge graph: expressed in RDF (W3C Rec. 2014 )
• Ontology O: expressed in OWL 2 QL (W3C Rec. 2012 )
• Mapping M: expressed in R2RML (W3C Rec. 2012 )
• Query q: expressed in SPARQL (W3C Rec. 2013 )
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 6/34
10. RDF – Data Represented as a Graph
The graph consists of a set of ⟨subject, predicate, object⟩ triples, over IRI, literal and blank nodes
• IRI nodes (formerly URI):
<http://example.org/M-25>,
<M-25>, ex:M-25 or :M-25
• Literal nodes:
"2008-02-12", "The Matrix"@en,
"511"^^xsd:integer
• class membership triples:
<A-1> rdf:type :Actor .
• object property triples:
<A-1> :playsIn <M-25> .
• data property triples:
<M-25> :releaseDate "2008-02-12" .
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 7/34
11. OWL 2 QL – Lightweight Ontology Language for Accessing Large Amounts of Data
Standard sub-language of OWL 2 [W3C Rec. 2012]
Its assertions encode a logical theory in the
DL-Lite fragment of description logics that
enables reasoning by query rewriting
Close correspondence with UML class diagrams
and ER schemas used in conceptual modeling
:actsIn rdfs:range :Movie
:actsIn rdfs:subPropertyOf :playsIn
. . . owl:someValuesFrom . . .
Actor
name: String
SeriesActor MovieActor
Play
title: String
Movie
actsIn
1..⋆
▶
playsIn
▶
{disjoint}
In f
ont
UM
Diego Calvanese (unibz + umu + ontopic) Ontology-based Data Access and Integration
Assertion type DL syntax OWL syntax
Subclass assertion MovieActor ⊑ Actor :MovieActor rdfs:subClassOf :Actor .
Class disjointness Actor ⊑ ¬Movie :Actor owl:disjointWith :Movie .
Domain of a property ∃actsIn ⊑ MovieActor :actsIn rdfs:domain :MovieActor .
Range of a property ∃actsIn−
⊑ Movie :actsIn rdfs:range :Movie .
Subproperty assertion actsIn ⊑ playsIn :actsIn rdfs:subPropertyOf :playsIn .
Inverse properties actsIn ≡ hasActor−
:actsIn owl:inverseOf :hasActor .
Mandatory participation MovieActor ⊑ ∃actsIn owl:someValuesFrom in superclass expression
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 8/34
12. Mappings
Define how to populate classes & properties via assertions of form: Qsql(⃗
x) ⇝ iri(⃗
x) rdf:type C
Qsql(⃗
x) ⇝ iri1(⃗
x) P iri2(⃗
x)
Ontology O:
:actsIn rdfs:domain :MovieActor .
:actsIn rdfs:range :Movie .
:Movie rdfs:subClassOf :Play .
:title rdfs:domain :Play .
:title rdfs:range xsd:string .
...
Mapping M:
m1: SELECT mcode, mtitle FROM MOVIE WHERE type = "m"
⇝ :m-{mcode} rdf:type :Movie . :m-{mcode} :title {mtitle} .
m2: SELECT M.mcode, A.acode FROM MOVIE M, ACTOR A
WHERE M.mcode = A.pcode AND M.type = "m"
⇝ :a-{acode} :actsIn :m-{mcode} .
Database D:
MOVIE
mcode mtitle myear type · · ·
511 The Matrix 1999 m · · ·
227 Blade Runner 1982 m · · ·
ACTOR
pcode acode aname · · ·
511 43 K. Reeves · · ·
511 57 C.A. Moss · · ·
VKG V from O, M, D:
:m-511 rdf:type :Movie .
:m-227 rdf:type :Movie .
:m-511 :title "The Matrix" .
:m-227 :title "Blade Runner" .
:a-43 :actsIn :m-511 .
:a-57 :actsIn :m-511 .
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 9/34
13. SPARQL Query Language
Standard query language for RDF data [W3C Rec. 2008, 2013], based on graph matching
SELECT ?a ?t WHERE {
?a rdf:type :Actor .
?a :playsIn ?m .
?m rdf:type :Movie .
?m :title ?t .
}
ndard query language for RDF data. [W3C Rec. 2008, 2013]
ry mechanism is based on graph matching.
?t
a rdf:type Actor .
a playsIn ?m .
m rdf:type Movie .
m title ?t .
?a
Actor
?m
Movie
?t
rdf:type
playsIn
rdf:type
title
guage features (SPARQL 1.1):
atches one of alternative graph patterns
L: produces a match even when part of the pattern is missing
FILTER conditions
Y, to express aggregations
remove possible solutions
paths (regular expressions)
Additional language features (SPARQL 1.1):
• UNION: matches one of alternative graph patterns
• OPTIONAL: produces a match even when part of the pattern is missing
• complex FILTER conditions
• GROUP BY, to express aggregations
• MINUS, to remove possible solutions
• property paths (regular expressions)
• · · ·
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 10/34
14. Query Answering in VKGs
Goal: answer a query q over a VKG V by jointly considering:
• the data provided by the data source D
• the mapping M encoding how such data translates to ontology
• the ontology O encoding domain knowledge that can be used to enrich answers.
Example:
• suppose that an entity :m-511 of class Movie can be obtained from the data D using some
mapping assertion in M (e.g., m1 about table MOVIE)
• suppose the ontology O states that each Movie is a Play, i.e., :Movie rdfs:subClassOf :Play
• if query q asks for all Plays, we should return also m-511 that is a Movie and thus also a Play
solution:
Query answering by Query Reformulation
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 11/34
15. Query Answering in VKGs – Query Reformulation
Ontology O
Mappings M
Data
Sources
D
. . .
. . .
. . .
. . .
Ontological Query q
Rewritten Query
SQL
Relational Answer
Ontological Answer
Rewriting
Unfolding
Evaluation
Result Translation
SELECT ?p {
?p rdf:type :Play
}
SELECT ?p {
{ ?p rdf:type :Play }
UNION
{ ?p rdf:type :Movie }
}
SELECT mcode
FROM MOVIE
WHERE type = “m”
?p
:m-511
mcode
511
D: MOVIE (mcode, mtitle, …)
O: :Movie rdfs:subClassOf :Play
M: SELECT mcode
FROM MOVIE
→ :m-{mcode} a :Movie
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 12/34
16. 1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
17. The Ontop VKG System
https://ontop-vkg.org/
• state-of-the-art VKG system born in UNIBZ (2009, first research in 2004)
• compliant with all relevant Semantic Web standards:
RDF, RDFS, OWL 2 QL, R2RML, SPARQL, and GeoSPARQL
• implemented in Java (v1.8+) and also available as Docker image
• supports all major relational DBMSs:
Oracle, DB2, MS SQL Server, Postgres, MySQL, Teiid, Dremio, Denodo, etc.
• open-source (Apache 2) project with a solid community
200+ mailing list members, 9000+ downloads in last 2 years
• commercial services (open-core model) by Ontopic , a UNIBZ spin-off funded in 2019
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 13/34
18. Ontop Usage Scenarios
s Solution
Mapping
Data
Ontology
materialize
virtualize
Virtual
Knowledge Graph
Materialized
Knowledge Graph
•••
Query Query Result
Triple Store
VKG query answering
• supports most of SPARQL 1.1 under
OWL 2 QL inference regime
• standard-compliant SPARQL endpoint
• over one relational source, or
• over multiple heterogeneous sources,
together with a data federation system
(e.g., Teiid, Dremio) providing an
integrated relational view of sources
VKG materialization
• use ontology and mappings to efficiently
& scalably materialize all the VKG triples
• the produced RDF file can be loaded in
any triplestore
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 14/34
20. Ontop in Research and Industrial Projects
Research projects
• Optique (EU FP7, 11/2012-10/2016)
Ontop-based scalable end-user access to
big data, 10 partners incl. Statoil, Siemens
• EPNet (ERC Advanced Grant)
cultural heritage project on food production
and distribution in the Roman Empire
• KAOS (Euregio, 06/2016-05/2019)
preparing standardized log files from
timestamped log data for process mining
• INODE (EU H2020, 11/2019-10/2022)
intelligent open data exploration
• IDEE (ERDF 2014-2020)
building & energy consumption data VKG
Industrial projects
• NOI Techpark
development South Tyrol tourism KG
• SIRIS Academic (Barcelona)
open data integration and dashboards
• Siemens Corportate Technologies (Munich)
access to temporal and streaming data
• Robert Bosch GmBH (Stuttgart)
analysis of manufacturing log data
• Metaphacts (Germany)
inclusion of Ontop in their platform
• Fluxicon (Milano)
• Isagog (Rome)
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 16/34
21. Ontop in Action Optique project, Statoil use case
From SQL query over the data source ...
SELECT wellbore.identifier, stratigraphic_zone.strat_column_identifier,
pty_pressure.pty_pressure_s, stratigraphic_zone.strat_unit_identifier
FROM wellbore, pty_pressure, activity fp_depth_data LEFT JOIN (
pty_location_1d AS fp_depth_pt1_loc
JOIN picked_stratigraphic_zones AS zs
ON zs.strat_zone_entry_md <= fp_depth_pt1_loc.Data_value_1_o AND
zs.strat_zone_exit_md >= fp_depth_pt1_loc.Data_value_1_o AND
zs.strat_zone_depth_uom = fp_depth_pt1_loc.Data_value_1_ou
JOIN join stratigraphic_zone
ON zs.wellbore = stratigraphic_zone.wellbore AND
zs.strat_column_identifier = stratigraphic_zone.
strat_column_identifier AND
zs.strat_interp_version = stratigraphic_zone.strat_interp_version AND
zs.strat_zone_identifier = stratigraphic_zone.strat_zone_identifier
) ON fp_depth_data.facility_s = zs.wellbore AND
fp_depth_data.activity_s = fp_depth_pt1_loc.activity_s,
activity_class AS form_pressure_class
WHERE wellbore.wellbore_s = fp_depth_data.Facility_s AND
fp_depth_data.activity_s = pty_pressure.activity_s AND
fp_depth_data.kind_s = form_pressure_class.activity_class_s AND
wellbore.ref_existence_kind = 'actual' AND
form_pressure_class.name = 'formation pressure depth data'
... to VKG SPARQL query
SELECT ?wellbore ?chronostrat_unit
?top_md_m ?lithostrat_unit
{
?w a :Wellbore ;
:name ?wellbore ;
:hasWellboreInterval ?intv .
?intv a :StratigraphicZone ;
:hasUnit ?cu ;
:hasTopDepth ?top .
?cu :name ?chronostrat_unit ;
:ofStratigraphicColumn
[ a :ChronoStratigraphicColumn ] .
?top a :MeasuredDepth ;
:valueInStandardUnit ?top_md_m .
?intv :overlapsWellboreInterval
?litho_intv .
?litho_intv :hasUnit ?lu .
?lu :name ?lithostrat_unit ;
:ofStratigraphicColumn
[ a :LithoStratigraphicColumn ] .
}
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 17/34
22. Ongoing Research & Development Directions
Mapping patterns
• bootstrapping (semi-automated generation) of mappings & possibly ontology for a data source
• reduces VKG deploying costs, mostly related to mapping authoring
Provenance & explanations
• report which sources/tuples, mappings and ontology axioms contributed to a query answer
• prototype Ontop extension based on provenance approaches (semi-rings) in DB community
Geospatial queries
• support GeoSPARQL to manipulate & query for geometries, leveraging DB support (e.g., PostGIS)
Temporal/streaming extensions
• support SQL-enabled stream processors like Flink and pattern matching over streaming data
Non-relational sources
• support non-relational data sources such as MongoDB, Neo4J and Web APIs
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 18/34
23. 1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
24. Accessing Web APIs
Data is increasingly available via Web APIs
• access to 3rd-party and/or dynamically-computed data
• access to data-related services, e.g., text search
Some APIs’ statisticsa
• 83% of all Internet traffic belongs to API-based services
• 2M+ API repositories on GitHub
• 90% of developers use APIs
• 30% of development time spent on coding APIs
Complex data access problem for applications operating on
data from both databases and APIs
a
https://nordicapis.com/20-impressive-api-economy-statistics/
RDB Sources
API Sources
SQL
calls
Application
complex
data access
problem
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 19/34
25. Accessing Web APIs – Open Data Hub (ODH) RDB + Semantic Search API Example
Answer hybrid queries like:
• get (plot) IRI, description, rating &
location of accommodations ...
• whose rating is 3 stars or more
(structured constraint) and ...
• whose EN description matches the
search string “horse riding” (text
constraint)
Semantic search: improved text search
that aims at capturing and leveraging
text meaning (vs term matching only)
• e.g., via BERT-based model from
Sentence Transformers library
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 20/34
26. Accessing Web APIs – Unified Access using a VKG
• applications operate on a unified VKG spanning APIs and
other involved sources
→ each API operation as an independent source
→ data federation setting due to multiple sources
• VKG built (e.g., via Ontop) over a Virtual Database (VDB)
federating all sources
→ VDB produced by a data federation system (e.g., Teiid)
→ the VDB offers a relational view of API data
→ VKG query reformulation may be tuned to this setting
• delegate the complex orchestration of source sub-queries
and API calls to a VKG + data federation system
• exploit existing database techniques to cope with API access
pattern restrictions during query answering
Virtual DB (VDB) (Teiid extension)
RDB Sources
API Sources
VKG (Ontop extension)
SQL
SQL
calls
SPARQL
User / Application
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 21/34
27. VDB – SQL/MED Specification
SQL/MED allows federating multiple sources in a virtual database (VDB)
• standardized SQL extension supported by some data federation systems like Teiid
• VDB as a set of schemas mapped to foreign data sources accessed via wrappers/translators
• we extend Teiid with a new service translator for accessing APIs
Example using Teiid with our extensions:
CREATE DATABASE vdb_example OPTIONS ( "... connection options for federated sources ..." );
USE DATABASE vdb_example;
CREATE SERVER db_source FOREIGN DATA WRAPPER postgresql; -- define RDB source with schema 'db'
CREATE SCHEMA db SERVER db_source; -- using 'postgresql' translator to access it
CREATE SERVER srv_source FOREIGN DATA WRAPPER service; -- define API source with schema 'srv'
CREATE SCHEMA srv SERVER srv_source; -- using 'service' translator to access it
IMPORT FOREIGN SCHEMA public FROM SERVER db_source INTO db OPTIONS ( importer.catalog 'public' );
SET SCHEMA srv;
-- CREATE FOREIGN TABLE / PROCEDURE statements mapped to API operations (API bindings)
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 22/34
28. VDB – API Bindings
API operations as SQL/MED procedures
• input tuple → 0..n output tuples
• URL, method, request/response templates
CREATE FOREIGN PROCEDURE api_semsearch_query (
query VARCHAR
) RETURNS TABLE (
query VARCHAR,
id VARCHAR,
score DOUBLE,
excerpt VARCHAR
) OPTIONS (
"method" 'post',
"url" 'http://semsearch:8080/query',
"requestBody" '{"query": "{query}", "n": 100}',
"responseBody" '{"matches": [{
"id": "{id}",
"score": "{score}",
"excerpt": "{excerpt}" }] }'
);
API data as SQL/MED virtual tables
• linked to API operations/procedures
• each procedure defines an access pattern
CREATE FOREIGN TABLE vt_semsearch_match (
query VARCHAR NOT NULL,
id VARCHAR NOT NULL,
score DOUBLE NOT NULL,
excerpt VARCHAR NOT NULL,
PRIMARY KEY (query, id)
) OPTIONS ( "select" 'api_semsearch_query' );
CREATE FOREIGN TABLE vt_semsearch_index (
id VARCHAR PRIMARY KEY,
text VARCHAR NOT NULL
) OPTIONS (
"UPDATABLE" 'true',
"upsert" 'api_semsearch_store',
"delete" 'api_semsearch_clear'
);
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 23/34
29. VDB – Query Translation & Execution
Given a VDB defined using SQL/MED + API Bindings and an input query over the VDB
• Teiid splits the query into sub-queries based on translator capabilities and cost heuristics
• sub-queries are sent to translators & Teiid handles remaining operations (e.g., federated joins)
Example SQL query
SELECT s.score,
s.excerpt,
a."AccoCategoryId",
a."AccoDetail-en-Name",
a."AccoDetail-en-City"
FROM srv.vt_semsearch_match AS s
JOIN db.v_accommodationsopen AS a
ON s.id = a."Id"
WHERE s.query = 'horse riding'
ORDER BY s.score DESC
LIMIT 10
Execution plan
LimitNode (limit = 10)
SortNode (s.score DESC)
ProjectNode (s.score, ... a."AccoDetail-en-City")
JoinNode (s.id = a."Id", merge join strategy)
AccessNode (API)
SELECT id, excerpt, score
FROM vt_semsearch_match
WHERE query = ’horse riding’
AccessNode (RDB)
SELECT "Id", "AccoDetail-en-Name",
"AccoDetail-en-City",
FROM v_accommodationsopen
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 24/34
30. VDB – Push-down of Projection, Filtering, Sorting, Slicing
Special input attributes map API capabilities related to standard relational operators
• filtering: return/process only objects matching some criteria (e.g., attribute = or ≥ constant)
• projection: include/exclude certain attributes in returned results
• sorting: sort results according to a certain attribute and direction (ascending/descending)
• slicing: return only a given page of all possible results
CREATE FOREIGN PROCEDURE api_station_data_from_to (
stype VARCHAR NOT NULL,
sname VARCHAR NOT NULL,
tname VARCHAR NOT NULL,
__min_inclusive__mvaliddate DATE NOT NULL, -- filter push down (conditions min <= mvaliddate <= max)
__max_inclusive__mvaliddate DATE NOT NULL,
__limit__ INTEGER -- slicing push down
) RETURNS TABLE ( ... )
) OPTIONS ( ... );
Partial/complete push down of these operators whenever possible
• allows offloading computation to the API (e.g., sorting)
• allows reducing costs by manipulating & transferring less data
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 25/34
31. VDB – Exploiting Bulk API Operations
Bulk API operations operate on multiple input tuples, such as lookup by set of IDs or bulk store
• their use enables better performance due to less API calls
• useful to speed-up dependent joins (using IN operator) between RDBMS and API data
A A
RDBMS table R virtual table S bulk API operation
(A input attribute)
⨝R.A = S.A
SELECT A, …
FROM R
WHERE …
1
SELECT A, …
FROM S
WHERE A IN (a1, a2, …)
AND …
3
2 Extract values of join
attribute A: a1, a2, …
API bindings
4 Bulk API calls with
multiple input tuples for
different values of A:
a1, a2, …
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 26/34
32. VDB – Data Materialization
Data materialization: required by API operations that cannot be invoked at query time
• operations too expensive to call at query time (e.g., align API and DB identifiers)
• operations instrumental to the use of external APIs (e.g., text indexing in a search engine)
Solution #1: materialized views in Teiid (or other data federation system used)
Solution #2: dedicated materialization engine for
flexibly executing arbitrary materialization rules:
• identifier – for documentation & diagnostics
• target – the system-managed computed table
(possibly virtual) where data is stored
• source – arbitrary SQL query (over any tables)
that produces the data to store
rules:
- id: index_accommodation_texts
target: vt_semsearch_index
source: |-
SELECT "Id" AS id,
"AccoDetail-en-Longdesc" AS text
FROM v_accommodationsopen
WHERE "AccoDetail-en-Longdesc"
IS NOT NULL
- ... other rules ...
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 27/34
33. VDB – Data Materialization (cont’d)
Rules (their SQL source queries) are analyzed to derive a rule dependency graph, which is mapped
to an execution plan using fixpoint rule evaluation for strongly connected components
R1 R2
R3 R4
R5
R1 R2
R3 R4
R5
sequence (
parallel (
R1,
sequence (
R2,
fixpoint (
parallel (
R3,
R4
)
)
)
),
R5
)
Rule / Table Dependencies Rule Dependencies Execution Plan
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 28/34
34. VKG – Example of Ontology & Mappings over the VDB
Ontology
schema:Accommodation a owl:Class ;
rdfs:subClassOf schema:Place ;
rdfs:label "Accommodation"@en ;
...
schema:name a owl:DatatypeProperty ;
...
hive:Match a owl:Class ...
Current ontology formalism (OWL 2 QL) reused
as is, but now also models data from APIs
Mappings
mappingId Semantic Search
target data:match/accommodation/{id}/{query}
a hive:Match;
hive:query {query}^^xsd:string;
hive:resource data:accommodation/{id};
hive:excerpt {excerpt}@en;
hive:score {score}^^xsd:decimal.
source SELECT *
FROM hiveodh.srv.vt_semsearch_match
Current VKG mapping formalism reused as is, but
data may now come from API virtual tables
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 29/34
35. VKG – Query Rewriting & Evaluation Example
User-supplied SPARQL query
SELECT ?h ?posLabel ?rating ?pos {
[] a hive:Match ;
hive:query "horse riding"^^xsd:string ;
hive:resource ?h ;
hive:excerpt ?excerpt ;
hive:score ?score .
?h a schema:LodgingBusiness ;
geo:defaultGeometry/geo:asWKT ?pos ;
schema:name ?name ;
schema:description ?description ;
schema:starRating/schema:ratingValue ?rating.
FILTER (?rating >= 3 && lang(?name) = 'en' &&
lang(?description) = 'en')
BIND (CONCAT(?name, " <br><br>...", ?excerpt,
"...<br><br>", ?description) AS ?posLabel)
}
ORDER BY DESC(?score) LIMIT 10
SQL query rewritten by Ontop
SELECT
v1.id,
v1.excerpt, -- fields used
v2."AccoDetail-en-Name", -- for deriving
v2."AccoDetail-en-Longdesc", -- ?posLabel
... complex expression computing rating ...,
ST_ASTEXT(v2."Geometry")
FROM
hiveodh.srv.vt_semsearch_match v1,
hiveodh.db.v_accommodationsopen v2
WHERE
v1."id" = v2."Id" AND
CAST(v1."query" AS TEXT) = 'horse riding' AND
... complex condition on rating >= 3 ... AND
... nonnull conditions for output columns ...
ORDER BY CAST(v1."score" AS DECIMAL) DESC
LIMIT 10
SQL query evaluated on the VDB by Teiid
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 30/34
36. VKG – ODH with Semantic Search Demo
Data sources
DB with ODH tourism data +
Semantic search API to index &
query accommodations texts
System
Ontop embedding Teiid +
materialization engine
Demo
https://hive.inf.unibz.it/
odh/vkg/ reformulate example
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 31/34
37. Overall Framework for VKGs over APIs
Virtual DB (VDB) Teiid + service translator
VKG Mappings
including virtual tables,
used for query rewriting
Materialization Rules
pre-compute results of
expensive API calls
→ VDB/VKG no more
fully “virtual”
API Bindings
define how to query/update a virtual
table via API calls, if possible
→ limited access patterns RDB Sources
API Sources
Virtual Knowledge Graph (VKG) Ontop
SQL
SQL
calls
Application
(VKG-based)
Application
(VDB-based)
SQL
SPARQL
VKG Ontology
formalizes the classes/properties
(the “schema”) of the VKG,
enabling reasoning
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 32/34
38. 1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
39. Takeaway Messages
Virtual Knowledge Graphs (VKG): flexible technology for building KGs over existing data source(s)
• useful for inherently relational data where a VKG engine + RDBMS may outperform a triplestore
• useful for existing data RDF-ification via VKG materialization to an RDF file
Ontop: mature, open-source VKG system with a solid user & developer community
• allows a VKG over a single RDB, with support for multiple database engines
• allows a VKG over multiple heterogeneous sources, in combination with an intermediate data
federation system such as the open-source Teiid & Dremio
• active research & development for adding new features and new data sources
VKGs over Web APIs: ongoing research & development effort
• enables transparent access to dynamically-computed API data via declarative queries
• API operations mapped to virtual relations, accessed through a Teiid extension
• optimizations for better using API features, such as bulk operations and operators’ push-down
• expensive API operations supported via pre-computation and data materialization
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 33/34