SlideShare a Scribd company logo
1 of 40
Download to read offline
Towards Virtual Knowledge Graphs over Web APIs
Francesco Corcoglioniti
2022-11-09
postdoc @ KRDB, Free University of Bolzano,
supported by HIVE Fusion Grant project (2021-2022), OntoCRM project (2022-2024), and Ontopic s.r.l
slides available online at https://bit.ly/3WOoldB
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
Big Data Context
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 1/34
Variety Drives Data Management Initiatives
69%
25%
6%
Relative Importance
Variety
Volume
Velocity
http://sloanreview.mit.edu/article/
variety-not-volume-is-driving-big-data-initiatives/
(2016)
Data model heterogeneity
relational data, graph data, XML, JSON, CSV,
text files, ...
System heterogeneity
even when systems adopt the same data
model, they are not always fully compatible
Schema heterogeneity
different people see things differently, and
design schemas differently
Data-level heterogeneity
e.g., ‘IBM’ vs. ‘Int. Business Machines’ vs.
‘International Business Machines’
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 2/34
Querying Data Takes Time and IT Expertise (besides Domain Knowledge)
Query from Statoil (now Equinor) use case
EU FP7 Optique project
Natural language: In a given area, return
pressure data tagged with stratigraphy and
quality control attributes
SQL: huge query joining 9 tables, the main one
with 38 columns with cryptic names
Query from Sloan Digital Sky Survey use case
EU H2020 INODE project
Natural language: Get all white dwarf stars
SQL: unintelligible query defining ‘white dwarf’
SELECT objID
FROM skyserverv3_correct.star
WHERE u - g < .4 AND g - r < .7 AND
r - i > .4 AND i - z > .4
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 3/34
Virtual Knowledge Graphs (VKG) – a Data Access / Integration Solution
Three key ideas:
1. use a global (or integrated) schema and map the data sources to the global schema
2. adopt a very flexible data model for the global schema
→ Knowledge Graph (KG) whose vocabulary is expressed in an ontology.
3. exploit virtualization, i.e., the KG is not materialized, but kept virtual
This gives rise to the Virtual Knowledge Graph (VKG) approach to data access / integration, also
called Ontology-Based Data Access / Integration (OBDA)
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 4/34
Virtual Knowledge graphs (VKG) – Core Components
Ontology
conceptualizes a domain of interest in terms of
classes and (binary) properties, overall defining
the terminological knowledge (TBox) of the VKG
Data sources
provide the data forming the RDF triples, i.e., the
assertional knowledge (ABox), of the VKG
Mapping
define how to generate the RDF triples from the raw
data (e.g., relational), via mapping assertions that
populate each class/property of the ontology
Queries
formulated against the VKG (which is virtual) and
rewritten in native queries evaluated over the sources
. . .
. . .
. . .
. . .
Ontology O
Mapping M
Data sources D
query
results
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 5/34
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
VKG Framework – Which Languages to Use?
Need to balance
• expressive power
of adopted languages for O, M, q
• query answering efficiency
with respect to data size
. . .
. . .
. . .
. . .
Ontology O
Mapping M
Data sources D
query
results
W3C has standardized languages that are suitable for VKGs:
• Knowledge graph: expressed in RDF (W3C Rec. 2014 )
• Ontology O: expressed in OWL 2 QL (W3C Rec. 2012 )
• Mapping M: expressed in R2RML (W3C Rec. 2012 )
• Query q: expressed in SPARQL (W3C Rec. 2013 )
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 6/34
RDF – Data Represented as a Graph
The graph consists of a set of ⟨subject, predicate, object⟩ triples, over IRI, literal and blank nodes
• IRI nodes (formerly URI):
<http://example.org/M-25>,
<M-25>, ex:M-25 or :M-25
• Literal nodes:
"2008-02-12", "The Matrix"@en,
"511"^^xsd:integer
• class membership triples:
<A-1> rdf:type :Actor .
• object property triples:
<A-1> :playsIn <M-25> .
• data property triples:
<M-25> :releaseDate "2008-02-12" .
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 7/34
OWL 2 QL – Lightweight Ontology Language for Accessing Large Amounts of Data
Standard sub-language of OWL 2 [W3C Rec. 2012]
Its assertions encode a logical theory in the
DL-Lite fragment of description logics that
enables reasoning by query rewriting
Close correspondence with UML class diagrams
and ER schemas used in conceptual modeling
:actsIn rdfs:range :Movie
:actsIn rdfs:subPropertyOf :playsIn
. . . owl:someValuesFrom . . .
Actor
name: String
SeriesActor MovieActor
Play
title: String
Movie
actsIn
1..⋆
▶
playsIn
▶
{disjoint}
In f
ont
UM
Diego Calvanese (unibz + umu + ontopic) Ontology-based Data Access and Integration
Assertion type DL syntax OWL syntax
Subclass assertion MovieActor ⊑ Actor :MovieActor rdfs:subClassOf :Actor .
Class disjointness Actor ⊑ ¬Movie :Actor owl:disjointWith :Movie .
Domain of a property ∃actsIn ⊑ MovieActor :actsIn rdfs:domain :MovieActor .
Range of a property ∃actsIn−
⊑ Movie :actsIn rdfs:range :Movie .
Subproperty assertion actsIn ⊑ playsIn :actsIn rdfs:subPropertyOf :playsIn .
Inverse properties actsIn ≡ hasActor−
:actsIn owl:inverseOf :hasActor .
Mandatory participation MovieActor ⊑ ∃actsIn owl:someValuesFrom in superclass expression
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 8/34
Mappings
Define how to populate classes & properties via assertions of form: Qsql(⃗
x) ⇝ iri(⃗
x) rdf:type C
Qsql(⃗
x) ⇝ iri1(⃗
x) P iri2(⃗
x)
Ontology O:
:actsIn rdfs:domain :MovieActor .
:actsIn rdfs:range :Movie .
:Movie rdfs:subClassOf :Play .
:title rdfs:domain :Play .
:title rdfs:range xsd:string .
...
Mapping M:
m1: SELECT mcode, mtitle FROM MOVIE WHERE type = "m"
⇝ :m-{mcode} rdf:type :Movie . :m-{mcode} :title {mtitle} .
m2: SELECT M.mcode, A.acode FROM MOVIE M, ACTOR A
WHERE M.mcode = A.pcode AND M.type = "m"
⇝ :a-{acode} :actsIn :m-{mcode} .
Database D:
MOVIE
mcode mtitle myear type · · ·
511 The Matrix 1999 m · · ·
227 Blade Runner 1982 m · · ·
ACTOR
pcode acode aname · · ·
511 43 K. Reeves · · ·
511 57 C.A. Moss · · ·
VKG V from O, M, D:
:m-511 rdf:type :Movie .
:m-227 rdf:type :Movie .
:m-511 :title "The Matrix" .
:m-227 :title "Blade Runner" .
:a-43 :actsIn :m-511 .
:a-57 :actsIn :m-511 .
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 9/34
SPARQL Query Language
Standard query language for RDF data [W3C Rec. 2008, 2013], based on graph matching
SELECT ?a ?t WHERE {
?a rdf:type :Actor .
?a :playsIn ?m .
?m rdf:type :Movie .
?m :title ?t .
}
ndard query language for RDF data. [W3C Rec. 2008, 2013]
ry mechanism is based on graph matching.
?t
a rdf:type Actor .
a playsIn ?m .
m rdf:type Movie .
m title ?t .
?a
Actor
?m
Movie
?t
rdf:type
playsIn
rdf:type
title
guage features (SPARQL 1.1):
atches one of alternative graph patterns
L: produces a match even when part of the pattern is missing
FILTER conditions
Y, to express aggregations
remove possible solutions
paths (regular expressions)
Additional language features (SPARQL 1.1):
• UNION: matches one of alternative graph patterns
• OPTIONAL: produces a match even when part of the pattern is missing
• complex FILTER conditions
• GROUP BY, to express aggregations
• MINUS, to remove possible solutions
• property paths (regular expressions)
• · · ·
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 10/34
Query Answering in VKGs
Goal: answer a query q over a VKG V by jointly considering:
• the data provided by the data source D
• the mapping M encoding how such data translates to ontology
• the ontology O encoding domain knowledge that can be used to enrich answers.
Example:
• suppose that an entity :m-511 of class Movie can be obtained from the data D using some
mapping assertion in M (e.g., m1 about table MOVIE)
• suppose the ontology O states that each Movie is a Play, i.e., :Movie rdfs:subClassOf :Play
• if query q asks for all Plays, we should return also m-511 that is a Movie and thus also a Play
solution:
Query answering by Query Reformulation
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 11/34
Query Answering in VKGs – Query Reformulation
Ontology O
Mappings M
Data
Sources
D
. . .
. . .
. . .
. . .
Ontological Query q
Rewritten Query
SQL
Relational Answer
Ontological Answer
Rewriting
Unfolding
Evaluation
Result Translation
SELECT ?p {
?p rdf:type :Play
}
SELECT ?p {
{ ?p rdf:type :Play }
UNION
{ ?p rdf:type :Movie }
}
SELECT mcode
FROM MOVIE
WHERE type = “m”
?p
:m-511
mcode
511
D: MOVIE (mcode, mtitle, …)
O: :Movie rdfs:subClassOf :Play
M: SELECT mcode
FROM MOVIE
→ :m-{mcode} a :Movie
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 12/34
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
The Ontop VKG System
https://ontop-vkg.org/
• state-of-the-art VKG system born in UNIBZ (2009, first research in 2004)
• compliant with all relevant Semantic Web standards:
RDF, RDFS, OWL 2 QL, R2RML, SPARQL, and GeoSPARQL
• implemented in Java (v1.8+) and also available as Docker image
• supports all major relational DBMSs:
Oracle, DB2, MS SQL Server, Postgres, MySQL, Teiid, Dremio, Denodo, etc.
• open-source (Apache 2) project with a solid community
200+ mailing list members, 9000+ downloads in last 2 years
• commercial services (open-core model) by Ontopic , a UNIBZ spin-off funded in 2019
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 13/34
Ontop Usage Scenarios
s Solution
Mapping
Data
Ontology
materialize
virtualize
Virtual
Knowledge Graph
Materialized
Knowledge Graph
•••
Query Query Result
Triple Store
VKG query answering
• supports most of SPARQL 1.1 under
OWL 2 QL inference regime
• standard-compliant SPARQL endpoint
• over one relational source, or
• over multiple heterogeneous sources,
together with a data federation system
(e.g., Teiid, Dremio) providing an
integrated relational view of sources
VKG materialization
• use ontology and mappings to efficiently
& scalably materialize all the VKG triples
• the produced RDF file can be loaded in
any triplestore
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 14/34
Ontop Developer Community
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 15/34
Ontop in Research and Industrial Projects
Research projects
• Optique (EU FP7, 11/2012-10/2016)
Ontop-based scalable end-user access to
big data, 10 partners incl. Statoil, Siemens
• EPNet (ERC Advanced Grant)
cultural heritage project on food production
and distribution in the Roman Empire
• KAOS (Euregio, 06/2016-05/2019)
preparing standardized log files from
timestamped log data for process mining
• INODE (EU H2020, 11/2019-10/2022)
intelligent open data exploration
• IDEE (ERDF 2014-2020)
building & energy consumption data VKG
Industrial projects
• NOI Techpark
development South Tyrol tourism KG
• SIRIS Academic (Barcelona)
open data integration and dashboards
• Siemens Corportate Technologies (Munich)
access to temporal and streaming data
• Robert Bosch GmBH (Stuttgart)
analysis of manufacturing log data
• Metaphacts (Germany)
inclusion of Ontop in their platform
• Fluxicon (Milano)
• Isagog (Rome)
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 16/34
Ontop in Action Optique project, Statoil use case
From SQL query over the data source ...
SELECT wellbore.identifier, stratigraphic_zone.strat_column_identifier,
pty_pressure.pty_pressure_s, stratigraphic_zone.strat_unit_identifier
FROM wellbore, pty_pressure, activity fp_depth_data LEFT JOIN (
pty_location_1d AS fp_depth_pt1_loc
JOIN picked_stratigraphic_zones AS zs
ON zs.strat_zone_entry_md <= fp_depth_pt1_loc.Data_value_1_o AND
zs.strat_zone_exit_md >= fp_depth_pt1_loc.Data_value_1_o AND
zs.strat_zone_depth_uom = fp_depth_pt1_loc.Data_value_1_ou
JOIN join stratigraphic_zone
ON zs.wellbore = stratigraphic_zone.wellbore AND
zs.strat_column_identifier = stratigraphic_zone.
strat_column_identifier AND
zs.strat_interp_version = stratigraphic_zone.strat_interp_version AND
zs.strat_zone_identifier = stratigraphic_zone.strat_zone_identifier
) ON fp_depth_data.facility_s = zs.wellbore AND
fp_depth_data.activity_s = fp_depth_pt1_loc.activity_s,
activity_class AS form_pressure_class
WHERE wellbore.wellbore_s = fp_depth_data.Facility_s AND
fp_depth_data.activity_s = pty_pressure.activity_s AND
fp_depth_data.kind_s = form_pressure_class.activity_class_s AND
wellbore.ref_existence_kind = 'actual' AND
form_pressure_class.name = 'formation pressure depth data'
... to VKG SPARQL query
SELECT ?wellbore ?chronostrat_unit
?top_md_m ?lithostrat_unit
{
?w a :Wellbore ;
:name ?wellbore ;
:hasWellboreInterval ?intv .
?intv a :StratigraphicZone ;
:hasUnit ?cu ;
:hasTopDepth ?top .
?cu :name ?chronostrat_unit ;
:ofStratigraphicColumn
[ a :ChronoStratigraphicColumn ] .
?top a :MeasuredDepth ;
:valueInStandardUnit ?top_md_m .
?intv :overlapsWellboreInterval
?litho_intv .
?litho_intv :hasUnit ?lu .
?lu :name ?lithostrat_unit ;
:ofStratigraphicColumn
[ a :LithoStratigraphicColumn ] .
}
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 17/34
Ongoing Research & Development Directions
Mapping patterns
• bootstrapping (semi-automated generation) of mappings & possibly ontology for a data source
• reduces VKG deploying costs, mostly related to mapping authoring
Provenance & explanations
• report which sources/tuples, mappings and ontology axioms contributed to a query answer
• prototype Ontop extension based on provenance approaches (semi-rings) in DB community
Geospatial queries
• support GeoSPARQL to manipulate & query for geometries, leveraging DB support (e.g., PostGIS)
Temporal/streaming extensions
• support SQL-enabled stream processors like Flink and pattern matching over streaming data
Non-relational sources
• support non-relational data sources such as MongoDB, Neo4J and Web APIs
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 18/34
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
Accessing Web APIs
Data is increasingly available via Web APIs
• access to 3rd-party and/or dynamically-computed data
• access to data-related services, e.g., text search
Some APIs’ statisticsa
• 83% of all Internet traffic belongs to API-based services
• 2M+ API repositories on GitHub
• 90% of developers use APIs
• 30% of development time spent on coding APIs
Complex data access problem for applications operating on
data from both databases and APIs
a
https://nordicapis.com/20-impressive-api-economy-statistics/
RDB Sources
API Sources
SQL
calls
Application
complex
data access
problem
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 19/34
Accessing Web APIs – Open Data Hub (ODH) RDB + Semantic Search API Example
Answer hybrid queries like:
• get (plot) IRI, description, rating &
location of accommodations ...
• whose rating is 3 stars or more
(structured constraint) and ...
• whose EN description matches the
search string “horse riding” (text
constraint)
Semantic search: improved text search
that aims at capturing and leveraging
text meaning (vs term matching only)
• e.g., via BERT-based model from
Sentence Transformers library
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 20/34
Accessing Web APIs – Unified Access using a VKG
• applications operate on a unified VKG spanning APIs and
other involved sources
→ each API operation as an independent source
→ data federation setting due to multiple sources
• VKG built (e.g., via Ontop) over a Virtual Database (VDB)
federating all sources
→ VDB produced by a data federation system (e.g., Teiid)
→ the VDB offers a relational view of API data
→ VKG query reformulation may be tuned to this setting
• delegate the complex orchestration of source sub-queries
and API calls to a VKG + data federation system
• exploit existing database techniques to cope with API access
pattern restrictions during query answering
Virtual DB (VDB) (Teiid extension)
RDB Sources
API Sources
VKG (Ontop extension)
SQL
SQL
calls
SPARQL
User / Application
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 21/34
VDB – SQL/MED Specification
SQL/MED allows federating multiple sources in a virtual database (VDB)
• standardized SQL extension supported by some data federation systems like Teiid
• VDB as a set of schemas mapped to foreign data sources accessed via wrappers/translators
• we extend Teiid with a new service translator for accessing APIs
Example using Teiid with our extensions:
CREATE DATABASE vdb_example OPTIONS ( "... connection options for federated sources ..." );
USE DATABASE vdb_example;
CREATE SERVER db_source FOREIGN DATA WRAPPER postgresql; -- define RDB source with schema 'db'
CREATE SCHEMA db SERVER db_source; -- using 'postgresql' translator to access it
CREATE SERVER srv_source FOREIGN DATA WRAPPER service; -- define API source with schema 'srv'
CREATE SCHEMA srv SERVER srv_source; -- using 'service' translator to access it
IMPORT FOREIGN SCHEMA public FROM SERVER db_source INTO db OPTIONS ( importer.catalog 'public' );
SET SCHEMA srv;
-- CREATE FOREIGN TABLE / PROCEDURE statements mapped to API operations (API bindings)
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 22/34
VDB – API Bindings
API operations as SQL/MED procedures
• input tuple → 0..n output tuples
• URL, method, request/response templates
CREATE FOREIGN PROCEDURE api_semsearch_query (
query VARCHAR
) RETURNS TABLE (
query VARCHAR,
id VARCHAR,
score DOUBLE,
excerpt VARCHAR
) OPTIONS (
"method" 'post',
"url" 'http://semsearch:8080/query',
"requestBody" '{"query": "{query}", "n": 100}',
"responseBody" '{"matches": [{
"id": "{id}",
"score": "{score}",
"excerpt": "{excerpt}" }] }'
);
API data as SQL/MED virtual tables
• linked to API operations/procedures
• each procedure defines an access pattern
CREATE FOREIGN TABLE vt_semsearch_match (
query VARCHAR NOT NULL,
id VARCHAR NOT NULL,
score DOUBLE NOT NULL,
excerpt VARCHAR NOT NULL,
PRIMARY KEY (query, id)
) OPTIONS ( "select" 'api_semsearch_query' );
CREATE FOREIGN TABLE vt_semsearch_index (
id VARCHAR PRIMARY KEY,
text VARCHAR NOT NULL
) OPTIONS (
"UPDATABLE" 'true',
"upsert" 'api_semsearch_store',
"delete" 'api_semsearch_clear'
);
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 23/34
VDB – Query Translation & Execution
Given a VDB defined using SQL/MED + API Bindings and an input query over the VDB
• Teiid splits the query into sub-queries based on translator capabilities and cost heuristics
• sub-queries are sent to translators & Teiid handles remaining operations (e.g., federated joins)
Example SQL query
SELECT s.score,
s.excerpt,
a."AccoCategoryId",
a."AccoDetail-en-Name",
a."AccoDetail-en-City"
FROM srv.vt_semsearch_match AS s
JOIN db.v_accommodationsopen AS a
ON s.id = a."Id"
WHERE s.query = 'horse riding'
ORDER BY s.score DESC
LIMIT 10
Execution plan
LimitNode (limit = 10)
SortNode (s.score DESC)
ProjectNode (s.score, ... a."AccoDetail-en-City")
JoinNode (s.id = a."Id", merge join strategy)
AccessNode (API)
SELECT id, excerpt, score
FROM vt_semsearch_match
WHERE query = ’horse riding’
AccessNode (RDB)
SELECT "Id", "AccoDetail-en-Name",
"AccoDetail-en-City",
FROM v_accommodationsopen
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 24/34
VDB – Push-down of Projection, Filtering, Sorting, Slicing
Special input attributes map API capabilities related to standard relational operators
• filtering: return/process only objects matching some criteria (e.g., attribute = or ≥ constant)
• projection: include/exclude certain attributes in returned results
• sorting: sort results according to a certain attribute and direction (ascending/descending)
• slicing: return only a given page of all possible results
CREATE FOREIGN PROCEDURE api_station_data_from_to (
stype VARCHAR NOT NULL,
sname VARCHAR NOT NULL,
tname VARCHAR NOT NULL,
__min_inclusive__mvaliddate DATE NOT NULL, -- filter push down (conditions min <= mvaliddate <= max)
__max_inclusive__mvaliddate DATE NOT NULL,
__limit__ INTEGER -- slicing push down
) RETURNS TABLE ( ... )
) OPTIONS ( ... );
Partial/complete push down of these operators whenever possible
• allows offloading computation to the API (e.g., sorting)
• allows reducing costs by manipulating & transferring less data
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 25/34
VDB – Exploiting Bulk API Operations
Bulk API operations operate on multiple input tuples, such as lookup by set of IDs or bulk store
• their use enables better performance due to less API calls
• useful to speed-up dependent joins (using IN operator) between RDBMS and API data
A A
RDBMS table R virtual table S bulk API operation
(A input attribute)
⨝R.A = S.A
SELECT A, …
FROM R
WHERE …
1
SELECT A, …
FROM S
WHERE A IN (a1, a2, …)
AND …
3
2 Extract values of join
attribute A: a1, a2, …
API bindings
4 Bulk API calls with
multiple input tuples for
different values of A:
a1, a2, …
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 26/34
VDB – Data Materialization
Data materialization: required by API operations that cannot be invoked at query time
• operations too expensive to call at query time (e.g., align API and DB identifiers)
• operations instrumental to the use of external APIs (e.g., text indexing in a search engine)
Solution #1: materialized views in Teiid (or other data federation system used)
Solution #2: dedicated materialization engine for
flexibly executing arbitrary materialization rules:
• identifier – for documentation & diagnostics
• target – the system-managed computed table
(possibly virtual) where data is stored
• source – arbitrary SQL query (over any tables)
that produces the data to store
rules:
- id: index_accommodation_texts
target: vt_semsearch_index
source: |-
SELECT "Id" AS id,
"AccoDetail-en-Longdesc" AS text
FROM v_accommodationsopen
WHERE "AccoDetail-en-Longdesc"
IS NOT NULL
- ... other rules ...
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 27/34
VDB – Data Materialization (cont’d)
Rules (their SQL source queries) are analyzed to derive a rule dependency graph, which is mapped
to an execution plan using fixpoint rule evaluation for strongly connected components
R1 R2
R3 R4
R5
R1 R2
R3 R4
R5
sequence (
parallel (
R1,
sequence (
R2,
fixpoint (
parallel (
R3,
R4
)
)
)
),
R5
)
Rule / Table Dependencies Rule Dependencies Execution Plan
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 28/34
VKG – Example of Ontology & Mappings over the VDB
Ontology
schema:Accommodation a owl:Class ;
rdfs:subClassOf schema:Place ;
rdfs:label "Accommodation"@en ;
...
schema:name a owl:DatatypeProperty ;
...
hive:Match a owl:Class ...
Current ontology formalism (OWL 2 QL) reused
as is, but now also models data from APIs
Mappings
mappingId Semantic Search
target data:match/accommodation/{id}/{query}
a hive:Match;
hive:query {query}^^xsd:string;
hive:resource data:accommodation/{id};
hive:excerpt {excerpt}@en;
hive:score {score}^^xsd:decimal.
source SELECT *
FROM hiveodh.srv.vt_semsearch_match
Current VKG mapping formalism reused as is, but
data may now come from API virtual tables
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 29/34
VKG – Query Rewriting & Evaluation Example
User-supplied SPARQL query
SELECT ?h ?posLabel ?rating ?pos {
[] a hive:Match ;
hive:query "horse riding"^^xsd:string ;
hive:resource ?h ;
hive:excerpt ?excerpt ;
hive:score ?score .
?h a schema:LodgingBusiness ;
geo:defaultGeometry/geo:asWKT ?pos ;
schema:name ?name ;
schema:description ?description ;
schema:starRating/schema:ratingValue ?rating.
FILTER (?rating >= 3 && lang(?name) = 'en' &&
lang(?description) = 'en')
BIND (CONCAT(?name, " <br><br>...", ?excerpt,
"...<br><br>", ?description) AS ?posLabel)
}
ORDER BY DESC(?score) LIMIT 10
SQL query rewritten by Ontop
SELECT
v1.id,
v1.excerpt, -- fields used
v2."AccoDetail-en-Name", -- for deriving
v2."AccoDetail-en-Longdesc", -- ?posLabel
... complex expression computing rating ...,
ST_ASTEXT(v2."Geometry")
FROM
hiveodh.srv.vt_semsearch_match v1,
hiveodh.db.v_accommodationsopen v2
WHERE
v1."id" = v2."Id" AND
CAST(v1."query" AS TEXT) = 'horse riding' AND
... complex condition on rating >= 3 ... AND
... nonnull conditions for output columns ...
ORDER BY CAST(v1."score" AS DECIMAL) DESC
LIMIT 10
SQL query evaluated on the VDB by Teiid
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 30/34
VKG – ODH with Semantic Search Demo
Data sources
DB with ODH tourism data +
Semantic search API to index &
query accommodations texts
System
Ontop embedding Teiid +
materialization engine
Demo
https://hive.inf.unibz.it/
odh/vkg/ reformulate example
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 31/34
Overall Framework for VKGs over APIs
Virtual DB (VDB) Teiid + service translator
VKG Mappings
including virtual tables,
used for query rewriting
Materialization Rules
pre-compute results of
expensive API calls
→ VDB/VKG no more
fully “virtual”
API Bindings
define how to query/update a virtual
table via API calls, if possible
→ limited access patterns RDB Sources
API Sources
Virtual Knowledge Graph (VKG) Ontop
SQL
SQL
calls
Application
(VKG-based)
Application
(VDB-based)
SQL
SPARQL
VKG Ontology
formalizes the classes/properties
(the “schema”) of the VKG,
enabling reasoning
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 32/34
1. Introduction
2. The VKG Framework
3. The Ontop VKG System
4. VKGs over Web APIs
5. Conclusions
Takeaway Messages
Virtual Knowledge Graphs (VKG): flexible technology for building KGs over existing data source(s)
• useful for inherently relational data where a VKG engine + RDBMS may outperform a triplestore
• useful for existing data RDF-ification via VKG materialization to an RDF file
Ontop: mature, open-source VKG system with a solid user & developer community
• allows a VKG over a single RDB, with support for multiple database engines
• allows a VKG over multiple heterogeneous sources, in combination with an intermediate data
federation system such as the open-source Teiid & Dremio
• active research & development for adding new features and new data sources
VKGs over Web APIs: ongoing research & development effort
• enables transparent access to dynamically-computed API data via declarative queries
• API operations mapped to virtual relations, accessed through a Teiid extension
• optimizations for better using API features, such as bulk operations and operators’ push-down
• expensive API operations supported via pre-computation and data materialization
Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 33/34
Thanks for attending!
these slides: https://bit.ly/3WOoldB
Ontop: https://ontop-vkg.org/

More Related Content

Similar to Towards Virtual Knowledge Graphs over Web APIs

Site Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW ClusterSite Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW ClusterJohn Breslin
 
From SMW to Rules
From SMW to RulesFrom SMW to Rules
From SMW to RulesJie Bao
 
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational DatabasesWWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational DatabasesSören Auer
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolLaura Po
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Architecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web ApplicationsArchitecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web Applicationsbpanulla
 
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3CSemantic Web and Related Work at W3C
Semantic Web and Related Work at W3CIvan Herman
 
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and InteroperabilityDCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and InteroperabilityMike Bergman
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 
A Semantic Wiki Based Light-Weight Web Application Model
A Semantic Wiki Based Light-Weight Web Application ModelA Semantic Wiki Based Light-Weight Web Application Model
A Semantic Wiki Based Light-Weight Web Application ModelJie Bao
 
Adcom2006 Full 6
Adcom2006 Full 6Adcom2006 Full 6
Adcom2006 Full 6umavanth
 
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)Beat Signer
 
History and Background of the USEWOD Data Challenge
History and Background of the  USEWOD Data ChallengeHistory and Background of the  USEWOD Data Challenge
History and Background of the USEWOD Data ChallengeKnud Möller
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 
Ontology-based Cooperation of Information Systems
Ontology-based Cooperation of Information SystemsOntology-based Cooperation of Information Systems
Ontology-based Cooperation of Information SystemsRaji Ghawi
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 

Similar to Towards Virtual Knowledge Graphs over Web APIs (20)

Site Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW ClusterSite Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW Cluster
 
From SMW to Rules
From SMW to RulesFrom SMW to Rules
From SMW to Rules
 
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational DatabasesWWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
WWW09 - Triplify Light-Weight Linked Data Publication from Relational Databases
 
DBpedia Mobile Explorer
DBpedia Mobile ExplorerDBpedia Mobile Explorer
DBpedia Mobile Explorer
 
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX toolWi2015 - Clustering of Linked Open Data - the LODeX tool
Wi2015 - Clustering of Linked Open Data - the LODeX tool
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Architecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web ApplicationsArchitecture Patterns for Semantic Web Applications
Architecture Patterns for Semantic Web Applications
 
Semantic Web and Related Work at W3C
Semantic Web and Related Work at W3CSemantic Web and Related Work at W3C
Semantic Web and Related Work at W3C
 
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and InteroperabilityDCMI Keynote: Bridging the Semantic Gaps and Interoperability
DCMI Keynote: Bridging the Semantic Gaps and Interoperability
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
A Semantic Wiki Based Light-Weight Web Application Model
A Semantic Wiki Based Light-Weight Web Application ModelA Semantic Wiki Based Light-Weight Web Application Model
A Semantic Wiki Based Light-Weight Web Application Model
 
Adcom2006 Full 6
Adcom2006 Full 6Adcom2006 Full 6
Adcom2006 Full 6
 
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
Semantic Web - Lecture 09 - Web Information Systems (4011474FNR)
 
History and Background of the USEWOD Data Challenge
History and Background of the  USEWOD Data ChallengeHistory and Background of the  USEWOD Data Challenge
History and Background of the USEWOD Data Challenge
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Ontology-based Cooperation of Information Systems
Ontology-based Cooperation of Information SystemsOntology-based Cooperation of Information Systems
Ontology-based Cooperation of Information Systems
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 

More from Speck&Tech

What should 6G be? - 6G: bridging gaps, connecting futures
What should 6G be? - 6G: bridging gaps, connecting futuresWhat should 6G be? - 6G: bridging gaps, connecting futures
What should 6G be? - 6G: bridging gaps, connecting futuresSpeck&Tech
 
Creare il sangue artificiale: "buon sangue non mente"
Creare il sangue artificiale: "buon sangue non mente"Creare il sangue artificiale: "buon sangue non mente"
Creare il sangue artificiale: "buon sangue non mente"Speck&Tech
 
AWS: gestire la scalabilità su larga scala
AWS: gestire la scalabilità su larga scalaAWS: gestire la scalabilità su larga scala
AWS: gestire la scalabilità su larga scalaSpeck&Tech
 
Praticamente... AWS - Amazon Web Services
Praticamente... AWS - Amazon Web ServicesPraticamente... AWS - Amazon Web Services
Praticamente... AWS - Amazon Web ServicesSpeck&Tech
 
Data Sense-making: navigating the world through the lens of information design
Data Sense-making: navigating the world through the lens of information designData Sense-making: navigating the world through the lens of information design
Data Sense-making: navigating the world through the lens of information designSpeck&Tech
 
Data Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as powerData Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as powerSpeck&Tech
 
Delve into the world of the human microbiome and metagenomics
Delve into the world of the human microbiome and metagenomicsDelve into the world of the human microbiome and metagenomics
Delve into the world of the human microbiome and metagenomicsSpeck&Tech
 
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...Speck&Tech
 
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...Speck&Tech
 
Why LLMs should be handled with care
Why LLMs should be handled with careWhy LLMs should be handled with care
Why LLMs should be handled with careSpeck&Tech
 
Building intelligent applications with Large Language Models
Building intelligent applications with Large Language ModelsBuilding intelligent applications with Large Language Models
Building intelligent applications with Large Language ModelsSpeck&Tech
 
Privacy in the era of quantum computers
Privacy in the era of quantum computersPrivacy in the era of quantum computers
Privacy in the era of quantum computersSpeck&Tech
 
Machine learning with quantum computers
Machine learning with quantum computersMachine learning with quantum computers
Machine learning with quantum computersSpeck&Tech
 
Give your Web App superpowers by using GPUs
Give your Web App superpowers by using GPUsGive your Web App superpowers by using GPUs
Give your Web App superpowers by using GPUsSpeck&Tech
 
From leaf to orbit: exploring forests with technology
From leaf to orbit: exploring forests with technologyFrom leaf to orbit: exploring forests with technology
From leaf to orbit: exploring forests with technologySpeck&Tech
 
Innovating Wood
Innovating WoodInnovating Wood
Innovating WoodSpeck&Tech
 
Behind the scenes of our everyday Internet: the role of an IXP like MIX
Behind the scenes of our everyday Internet: the role of an IXP like MIXBehind the scenes of our everyday Internet: the role of an IXP like MIX
Behind the scenes of our everyday Internet: the role of an IXP like MIXSpeck&Tech
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceSpeck&Tech
 
Truck planning: how to certify the right route
Truck planning: how to certify the right routeTruck planning: how to certify the right route
Truck planning: how to certify the right routeSpeck&Tech
 
Break it up! 5G, cruise control, autonomous vehicle cooperation, and bending ...
Break it up! 5G, cruise control, autonomous vehicle cooperation, and bending ...Break it up! 5G, cruise control, autonomous vehicle cooperation, and bending ...
Break it up! 5G, cruise control, autonomous vehicle cooperation, and bending ...Speck&Tech
 

More from Speck&Tech (20)

What should 6G be? - 6G: bridging gaps, connecting futures
What should 6G be? - 6G: bridging gaps, connecting futuresWhat should 6G be? - 6G: bridging gaps, connecting futures
What should 6G be? - 6G: bridging gaps, connecting futures
 
Creare il sangue artificiale: "buon sangue non mente"
Creare il sangue artificiale: "buon sangue non mente"Creare il sangue artificiale: "buon sangue non mente"
Creare il sangue artificiale: "buon sangue non mente"
 
AWS: gestire la scalabilità su larga scala
AWS: gestire la scalabilità su larga scalaAWS: gestire la scalabilità su larga scala
AWS: gestire la scalabilità su larga scala
 
Praticamente... AWS - Amazon Web Services
Praticamente... AWS - Amazon Web ServicesPraticamente... AWS - Amazon Web Services
Praticamente... AWS - Amazon Web Services
 
Data Sense-making: navigating the world through the lens of information design
Data Sense-making: navigating the world through the lens of information designData Sense-making: navigating the world through the lens of information design
Data Sense-making: navigating the world through the lens of information design
 
Data Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as powerData Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as power
 
Delve into the world of the human microbiome and metagenomics
Delve into the world of the human microbiome and metagenomicsDelve into the world of the human microbiome and metagenomics
Delve into the world of the human microbiome and metagenomics
 
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
Home4MeAi: un progetto sociale che utilizza dispositivi IoT per sfruttare le ...
 
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
Monitorare una flotta di autobus: architettura di un progetto di acquisizione...
 
Why LLMs should be handled with care
Why LLMs should be handled with careWhy LLMs should be handled with care
Why LLMs should be handled with care
 
Building intelligent applications with Large Language Models
Building intelligent applications with Large Language ModelsBuilding intelligent applications with Large Language Models
Building intelligent applications with Large Language Models
 
Privacy in the era of quantum computers
Privacy in the era of quantum computersPrivacy in the era of quantum computers
Privacy in the era of quantum computers
 
Machine learning with quantum computers
Machine learning with quantum computersMachine learning with quantum computers
Machine learning with quantum computers
 
Give your Web App superpowers by using GPUs
Give your Web App superpowers by using GPUsGive your Web App superpowers by using GPUs
Give your Web App superpowers by using GPUs
 
From leaf to orbit: exploring forests with technology
From leaf to orbit: exploring forests with technologyFrom leaf to orbit: exploring forests with technology
From leaf to orbit: exploring forests with technology
 
Innovating Wood
Innovating WoodInnovating Wood
Innovating Wood
 
Behind the scenes of our everyday Internet: the role of an IXP like MIX
Behind the scenes of our everyday Internet: the role of an IXP like MIXBehind the scenes of our everyday Internet: the role of an IXP like MIX
Behind the scenes of our everyday Internet: the role of an IXP like MIX
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for science
 
Truck planning: how to certify the right route
Truck planning: how to certify the right routeTruck planning: how to certify the right route
Truck planning: how to certify the right route
 
Break it up! 5G, cruise control, autonomous vehicle cooperation, and bending ...
Break it up! 5G, cruise control, autonomous vehicle cooperation, and bending ...Break it up! 5G, cruise control, autonomous vehicle cooperation, and bending ...
Break it up! 5G, cruise control, autonomous vehicle cooperation, and bending ...
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Towards Virtual Knowledge Graphs over Web APIs

  • 1. Towards Virtual Knowledge Graphs over Web APIs Francesco Corcoglioniti 2022-11-09 postdoc @ KRDB, Free University of Bolzano, supported by HIVE Fusion Grant project (2021-2022), OntoCRM project (2022-2024), and Ontopic s.r.l slides available online at https://bit.ly/3WOoldB
  • 2. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 3. Big Data Context Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 1/34
  • 4. Variety Drives Data Management Initiatives 69% 25% 6% Relative Importance Variety Volume Velocity http://sloanreview.mit.edu/article/ variety-not-volume-is-driving-big-data-initiatives/ (2016) Data model heterogeneity relational data, graph data, XML, JSON, CSV, text files, ... System heterogeneity even when systems adopt the same data model, they are not always fully compatible Schema heterogeneity different people see things differently, and design schemas differently Data-level heterogeneity e.g., ‘IBM’ vs. ‘Int. Business Machines’ vs. ‘International Business Machines’ Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 2/34
  • 5. Querying Data Takes Time and IT Expertise (besides Domain Knowledge) Query from Statoil (now Equinor) use case EU FP7 Optique project Natural language: In a given area, return pressure data tagged with stratigraphy and quality control attributes SQL: huge query joining 9 tables, the main one with 38 columns with cryptic names Query from Sloan Digital Sky Survey use case EU H2020 INODE project Natural language: Get all white dwarf stars SQL: unintelligible query defining ‘white dwarf’ SELECT objID FROM skyserverv3_correct.star WHERE u - g < .4 AND g - r < .7 AND r - i > .4 AND i - z > .4 Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 3/34
  • 6. Virtual Knowledge Graphs (VKG) – a Data Access / Integration Solution Three key ideas: 1. use a global (or integrated) schema and map the data sources to the global schema 2. adopt a very flexible data model for the global schema → Knowledge Graph (KG) whose vocabulary is expressed in an ontology. 3. exploit virtualization, i.e., the KG is not materialized, but kept virtual This gives rise to the Virtual Knowledge Graph (VKG) approach to data access / integration, also called Ontology-Based Data Access / Integration (OBDA) Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 4/34
  • 7. Virtual Knowledge graphs (VKG) – Core Components Ontology conceptualizes a domain of interest in terms of classes and (binary) properties, overall defining the terminological knowledge (TBox) of the VKG Data sources provide the data forming the RDF triples, i.e., the assertional knowledge (ABox), of the VKG Mapping define how to generate the RDF triples from the raw data (e.g., relational), via mapping assertions that populate each class/property of the ontology Queries formulated against the VKG (which is virtual) and rewritten in native queries evaluated over the sources . . . . . . . . . . . . Ontology O Mapping M Data sources D query results Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 5/34
  • 8. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 9. VKG Framework – Which Languages to Use? Need to balance • expressive power of adopted languages for O, M, q • query answering efficiency with respect to data size . . . . . . . . . . . . Ontology O Mapping M Data sources D query results W3C has standardized languages that are suitable for VKGs: • Knowledge graph: expressed in RDF (W3C Rec. 2014 ) • Ontology O: expressed in OWL 2 QL (W3C Rec. 2012 ) • Mapping M: expressed in R2RML (W3C Rec. 2012 ) • Query q: expressed in SPARQL (W3C Rec. 2013 ) Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 6/34
  • 10. RDF – Data Represented as a Graph The graph consists of a set of ⟨subject, predicate, object⟩ triples, over IRI, literal and blank nodes • IRI nodes (formerly URI): <http://example.org/M-25>, <M-25>, ex:M-25 or :M-25 • Literal nodes: "2008-02-12", "The Matrix"@en, "511"^^xsd:integer • class membership triples: <A-1> rdf:type :Actor . • object property triples: <A-1> :playsIn <M-25> . • data property triples: <M-25> :releaseDate "2008-02-12" . Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 7/34
  • 11. OWL 2 QL – Lightweight Ontology Language for Accessing Large Amounts of Data Standard sub-language of OWL 2 [W3C Rec. 2012] Its assertions encode a logical theory in the DL-Lite fragment of description logics that enables reasoning by query rewriting Close correspondence with UML class diagrams and ER schemas used in conceptual modeling :actsIn rdfs:range :Movie :actsIn rdfs:subPropertyOf :playsIn . . . owl:someValuesFrom . . . Actor name: String SeriesActor MovieActor Play title: String Movie actsIn 1..⋆ ▶ playsIn ▶ {disjoint} In f ont UM Diego Calvanese (unibz + umu + ontopic) Ontology-based Data Access and Integration Assertion type DL syntax OWL syntax Subclass assertion MovieActor ⊑ Actor :MovieActor rdfs:subClassOf :Actor . Class disjointness Actor ⊑ ¬Movie :Actor owl:disjointWith :Movie . Domain of a property ∃actsIn ⊑ MovieActor :actsIn rdfs:domain :MovieActor . Range of a property ∃actsIn− ⊑ Movie :actsIn rdfs:range :Movie . Subproperty assertion actsIn ⊑ playsIn :actsIn rdfs:subPropertyOf :playsIn . Inverse properties actsIn ≡ hasActor− :actsIn owl:inverseOf :hasActor . Mandatory participation MovieActor ⊑ ∃actsIn owl:someValuesFrom in superclass expression Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 8/34
  • 12. Mappings Define how to populate classes & properties via assertions of form: Qsql(⃗ x) ⇝ iri(⃗ x) rdf:type C Qsql(⃗ x) ⇝ iri1(⃗ x) P iri2(⃗ x) Ontology O: :actsIn rdfs:domain :MovieActor . :actsIn rdfs:range :Movie . :Movie rdfs:subClassOf :Play . :title rdfs:domain :Play . :title rdfs:range xsd:string . ... Mapping M: m1: SELECT mcode, mtitle FROM MOVIE WHERE type = "m" ⇝ :m-{mcode} rdf:type :Movie . :m-{mcode} :title {mtitle} . m2: SELECT M.mcode, A.acode FROM MOVIE M, ACTOR A WHERE M.mcode = A.pcode AND M.type = "m" ⇝ :a-{acode} :actsIn :m-{mcode} . Database D: MOVIE mcode mtitle myear type · · · 511 The Matrix 1999 m · · · 227 Blade Runner 1982 m · · · ACTOR pcode acode aname · · · 511 43 K. Reeves · · · 511 57 C.A. Moss · · · VKG V from O, M, D: :m-511 rdf:type :Movie . :m-227 rdf:type :Movie . :m-511 :title "The Matrix" . :m-227 :title "Blade Runner" . :a-43 :actsIn :m-511 . :a-57 :actsIn :m-511 . Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 9/34
  • 13. SPARQL Query Language Standard query language for RDF data [W3C Rec. 2008, 2013], based on graph matching SELECT ?a ?t WHERE { ?a rdf:type :Actor . ?a :playsIn ?m . ?m rdf:type :Movie . ?m :title ?t . } ndard query language for RDF data. [W3C Rec. 2008, 2013] ry mechanism is based on graph matching. ?t a rdf:type Actor . a playsIn ?m . m rdf:type Movie . m title ?t . ?a Actor ?m Movie ?t rdf:type playsIn rdf:type title guage features (SPARQL 1.1): atches one of alternative graph patterns L: produces a match even when part of the pattern is missing FILTER conditions Y, to express aggregations remove possible solutions paths (regular expressions) Additional language features (SPARQL 1.1): • UNION: matches one of alternative graph patterns • OPTIONAL: produces a match even when part of the pattern is missing • complex FILTER conditions • GROUP BY, to express aggregations • MINUS, to remove possible solutions • property paths (regular expressions) • · · · Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 10/34
  • 14. Query Answering in VKGs Goal: answer a query q over a VKG V by jointly considering: • the data provided by the data source D • the mapping M encoding how such data translates to ontology • the ontology O encoding domain knowledge that can be used to enrich answers. Example: • suppose that an entity :m-511 of class Movie can be obtained from the data D using some mapping assertion in M (e.g., m1 about table MOVIE) • suppose the ontology O states that each Movie is a Play, i.e., :Movie rdfs:subClassOf :Play • if query q asks for all Plays, we should return also m-511 that is a Movie and thus also a Play solution: Query answering by Query Reformulation Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 11/34
  • 15. Query Answering in VKGs – Query Reformulation Ontology O Mappings M Data Sources D . . . . . . . . . . . . Ontological Query q Rewritten Query SQL Relational Answer Ontological Answer Rewriting Unfolding Evaluation Result Translation SELECT ?p { ?p rdf:type :Play } SELECT ?p { { ?p rdf:type :Play } UNION { ?p rdf:type :Movie } } SELECT mcode FROM MOVIE WHERE type = “m” ?p :m-511 mcode 511 D: MOVIE (mcode, mtitle, …) O: :Movie rdfs:subClassOf :Play M: SELECT mcode FROM MOVIE → :m-{mcode} a :Movie Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 12/34
  • 16. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 17. The Ontop VKG System https://ontop-vkg.org/ • state-of-the-art VKG system born in UNIBZ (2009, first research in 2004) • compliant with all relevant Semantic Web standards: RDF, RDFS, OWL 2 QL, R2RML, SPARQL, and GeoSPARQL • implemented in Java (v1.8+) and also available as Docker image • supports all major relational DBMSs: Oracle, DB2, MS SQL Server, Postgres, MySQL, Teiid, Dremio, Denodo, etc. • open-source (Apache 2) project with a solid community 200+ mailing list members, 9000+ downloads in last 2 years • commercial services (open-core model) by Ontopic , a UNIBZ spin-off funded in 2019 Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 13/34
  • 18. Ontop Usage Scenarios s Solution Mapping Data Ontology materialize virtualize Virtual Knowledge Graph Materialized Knowledge Graph ••• Query Query Result Triple Store VKG query answering • supports most of SPARQL 1.1 under OWL 2 QL inference regime • standard-compliant SPARQL endpoint • over one relational source, or • over multiple heterogeneous sources, together with a data federation system (e.g., Teiid, Dremio) providing an integrated relational view of sources VKG materialization • use ontology and mappings to efficiently & scalably materialize all the VKG triples • the produced RDF file can be loaded in any triplestore Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 14/34
  • 19. Ontop Developer Community Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 15/34
  • 20. Ontop in Research and Industrial Projects Research projects • Optique (EU FP7, 11/2012-10/2016) Ontop-based scalable end-user access to big data, 10 partners incl. Statoil, Siemens • EPNet (ERC Advanced Grant) cultural heritage project on food production and distribution in the Roman Empire • KAOS (Euregio, 06/2016-05/2019) preparing standardized log files from timestamped log data for process mining • INODE (EU H2020, 11/2019-10/2022) intelligent open data exploration • IDEE (ERDF 2014-2020) building & energy consumption data VKG Industrial projects • NOI Techpark development South Tyrol tourism KG • SIRIS Academic (Barcelona) open data integration and dashboards • Siemens Corportate Technologies (Munich) access to temporal and streaming data • Robert Bosch GmBH (Stuttgart) analysis of manufacturing log data • Metaphacts (Germany) inclusion of Ontop in their platform • Fluxicon (Milano) • Isagog (Rome) Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 16/34
  • 21. Ontop in Action Optique project, Statoil use case From SQL query over the data source ... SELECT wellbore.identifier, stratigraphic_zone.strat_column_identifier, pty_pressure.pty_pressure_s, stratigraphic_zone.strat_unit_identifier FROM wellbore, pty_pressure, activity fp_depth_data LEFT JOIN ( pty_location_1d AS fp_depth_pt1_loc JOIN picked_stratigraphic_zones AS zs ON zs.strat_zone_entry_md <= fp_depth_pt1_loc.Data_value_1_o AND zs.strat_zone_exit_md >= fp_depth_pt1_loc.Data_value_1_o AND zs.strat_zone_depth_uom = fp_depth_pt1_loc.Data_value_1_ou JOIN join stratigraphic_zone ON zs.wellbore = stratigraphic_zone.wellbore AND zs.strat_column_identifier = stratigraphic_zone. strat_column_identifier AND zs.strat_interp_version = stratigraphic_zone.strat_interp_version AND zs.strat_zone_identifier = stratigraphic_zone.strat_zone_identifier ) ON fp_depth_data.facility_s = zs.wellbore AND fp_depth_data.activity_s = fp_depth_pt1_loc.activity_s, activity_class AS form_pressure_class WHERE wellbore.wellbore_s = fp_depth_data.Facility_s AND fp_depth_data.activity_s = pty_pressure.activity_s AND fp_depth_data.kind_s = form_pressure_class.activity_class_s AND wellbore.ref_existence_kind = 'actual' AND form_pressure_class.name = 'formation pressure depth data' ... to VKG SPARQL query SELECT ?wellbore ?chronostrat_unit ?top_md_m ?lithostrat_unit { ?w a :Wellbore ; :name ?wellbore ; :hasWellboreInterval ?intv . ?intv a :StratigraphicZone ; :hasUnit ?cu ; :hasTopDepth ?top . ?cu :name ?chronostrat_unit ; :ofStratigraphicColumn [ a :ChronoStratigraphicColumn ] . ?top a :MeasuredDepth ; :valueInStandardUnit ?top_md_m . ?intv :overlapsWellboreInterval ?litho_intv . ?litho_intv :hasUnit ?lu . ?lu :name ?lithostrat_unit ; :ofStratigraphicColumn [ a :LithoStratigraphicColumn ] . } Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 17/34
  • 22. Ongoing Research & Development Directions Mapping patterns • bootstrapping (semi-automated generation) of mappings & possibly ontology for a data source • reduces VKG deploying costs, mostly related to mapping authoring Provenance & explanations • report which sources/tuples, mappings and ontology axioms contributed to a query answer • prototype Ontop extension based on provenance approaches (semi-rings) in DB community Geospatial queries • support GeoSPARQL to manipulate & query for geometries, leveraging DB support (e.g., PostGIS) Temporal/streaming extensions • support SQL-enabled stream processors like Flink and pattern matching over streaming data Non-relational sources • support non-relational data sources such as MongoDB, Neo4J and Web APIs Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 18/34
  • 23. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 24. Accessing Web APIs Data is increasingly available via Web APIs • access to 3rd-party and/or dynamically-computed data • access to data-related services, e.g., text search Some APIs’ statisticsa • 83% of all Internet traffic belongs to API-based services • 2M+ API repositories on GitHub • 90% of developers use APIs • 30% of development time spent on coding APIs Complex data access problem for applications operating on data from both databases and APIs a https://nordicapis.com/20-impressive-api-economy-statistics/ RDB Sources API Sources SQL calls Application complex data access problem Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 19/34
  • 25. Accessing Web APIs – Open Data Hub (ODH) RDB + Semantic Search API Example Answer hybrid queries like: • get (plot) IRI, description, rating & location of accommodations ... • whose rating is 3 stars or more (structured constraint) and ... • whose EN description matches the search string “horse riding” (text constraint) Semantic search: improved text search that aims at capturing and leveraging text meaning (vs term matching only) • e.g., via BERT-based model from Sentence Transformers library Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 20/34
  • 26. Accessing Web APIs – Unified Access using a VKG • applications operate on a unified VKG spanning APIs and other involved sources → each API operation as an independent source → data federation setting due to multiple sources • VKG built (e.g., via Ontop) over a Virtual Database (VDB) federating all sources → VDB produced by a data federation system (e.g., Teiid) → the VDB offers a relational view of API data → VKG query reformulation may be tuned to this setting • delegate the complex orchestration of source sub-queries and API calls to a VKG + data federation system • exploit existing database techniques to cope with API access pattern restrictions during query answering Virtual DB (VDB) (Teiid extension) RDB Sources API Sources VKG (Ontop extension) SQL SQL calls SPARQL User / Application Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 21/34
  • 27. VDB – SQL/MED Specification SQL/MED allows federating multiple sources in a virtual database (VDB) • standardized SQL extension supported by some data federation systems like Teiid • VDB as a set of schemas mapped to foreign data sources accessed via wrappers/translators • we extend Teiid with a new service translator for accessing APIs Example using Teiid with our extensions: CREATE DATABASE vdb_example OPTIONS ( "... connection options for federated sources ..." ); USE DATABASE vdb_example; CREATE SERVER db_source FOREIGN DATA WRAPPER postgresql; -- define RDB source with schema 'db' CREATE SCHEMA db SERVER db_source; -- using 'postgresql' translator to access it CREATE SERVER srv_source FOREIGN DATA WRAPPER service; -- define API source with schema 'srv' CREATE SCHEMA srv SERVER srv_source; -- using 'service' translator to access it IMPORT FOREIGN SCHEMA public FROM SERVER db_source INTO db OPTIONS ( importer.catalog 'public' ); SET SCHEMA srv; -- CREATE FOREIGN TABLE / PROCEDURE statements mapped to API operations (API bindings) Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 22/34
  • 28. VDB – API Bindings API operations as SQL/MED procedures • input tuple → 0..n output tuples • URL, method, request/response templates CREATE FOREIGN PROCEDURE api_semsearch_query ( query VARCHAR ) RETURNS TABLE ( query VARCHAR, id VARCHAR, score DOUBLE, excerpt VARCHAR ) OPTIONS ( "method" 'post', "url" 'http://semsearch:8080/query', "requestBody" '{"query": "{query}", "n": 100}', "responseBody" '{"matches": [{ "id": "{id}", "score": "{score}", "excerpt": "{excerpt}" }] }' ); API data as SQL/MED virtual tables • linked to API operations/procedures • each procedure defines an access pattern CREATE FOREIGN TABLE vt_semsearch_match ( query VARCHAR NOT NULL, id VARCHAR NOT NULL, score DOUBLE NOT NULL, excerpt VARCHAR NOT NULL, PRIMARY KEY (query, id) ) OPTIONS ( "select" 'api_semsearch_query' ); CREATE FOREIGN TABLE vt_semsearch_index ( id VARCHAR PRIMARY KEY, text VARCHAR NOT NULL ) OPTIONS ( "UPDATABLE" 'true', "upsert" 'api_semsearch_store', "delete" 'api_semsearch_clear' ); Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 23/34
  • 29. VDB – Query Translation & Execution Given a VDB defined using SQL/MED + API Bindings and an input query over the VDB • Teiid splits the query into sub-queries based on translator capabilities and cost heuristics • sub-queries are sent to translators & Teiid handles remaining operations (e.g., federated joins) Example SQL query SELECT s.score, s.excerpt, a."AccoCategoryId", a."AccoDetail-en-Name", a."AccoDetail-en-City" FROM srv.vt_semsearch_match AS s JOIN db.v_accommodationsopen AS a ON s.id = a."Id" WHERE s.query = 'horse riding' ORDER BY s.score DESC LIMIT 10 Execution plan LimitNode (limit = 10) SortNode (s.score DESC) ProjectNode (s.score, ... a."AccoDetail-en-City") JoinNode (s.id = a."Id", merge join strategy) AccessNode (API) SELECT id, excerpt, score FROM vt_semsearch_match WHERE query = ’horse riding’ AccessNode (RDB) SELECT "Id", "AccoDetail-en-Name", "AccoDetail-en-City", FROM v_accommodationsopen Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 24/34
  • 30. VDB – Push-down of Projection, Filtering, Sorting, Slicing Special input attributes map API capabilities related to standard relational operators • filtering: return/process only objects matching some criteria (e.g., attribute = or ≥ constant) • projection: include/exclude certain attributes in returned results • sorting: sort results according to a certain attribute and direction (ascending/descending) • slicing: return only a given page of all possible results CREATE FOREIGN PROCEDURE api_station_data_from_to ( stype VARCHAR NOT NULL, sname VARCHAR NOT NULL, tname VARCHAR NOT NULL, __min_inclusive__mvaliddate DATE NOT NULL, -- filter push down (conditions min <= mvaliddate <= max) __max_inclusive__mvaliddate DATE NOT NULL, __limit__ INTEGER -- slicing push down ) RETURNS TABLE ( ... ) ) OPTIONS ( ... ); Partial/complete push down of these operators whenever possible • allows offloading computation to the API (e.g., sorting) • allows reducing costs by manipulating & transferring less data Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 25/34
  • 31. VDB – Exploiting Bulk API Operations Bulk API operations operate on multiple input tuples, such as lookup by set of IDs or bulk store • their use enables better performance due to less API calls • useful to speed-up dependent joins (using IN operator) between RDBMS and API data A A RDBMS table R virtual table S bulk API operation (A input attribute) ⨝R.A = S.A SELECT A, … FROM R WHERE … 1 SELECT A, … FROM S WHERE A IN (a1, a2, …) AND … 3 2 Extract values of join attribute A: a1, a2, … API bindings 4 Bulk API calls with multiple input tuples for different values of A: a1, a2, … Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 26/34
  • 32. VDB – Data Materialization Data materialization: required by API operations that cannot be invoked at query time • operations too expensive to call at query time (e.g., align API and DB identifiers) • operations instrumental to the use of external APIs (e.g., text indexing in a search engine) Solution #1: materialized views in Teiid (or other data federation system used) Solution #2: dedicated materialization engine for flexibly executing arbitrary materialization rules: • identifier – for documentation & diagnostics • target – the system-managed computed table (possibly virtual) where data is stored • source – arbitrary SQL query (over any tables) that produces the data to store rules: - id: index_accommodation_texts target: vt_semsearch_index source: |- SELECT "Id" AS id, "AccoDetail-en-Longdesc" AS text FROM v_accommodationsopen WHERE "AccoDetail-en-Longdesc" IS NOT NULL - ... other rules ... Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 27/34
  • 33. VDB – Data Materialization (cont’d) Rules (their SQL source queries) are analyzed to derive a rule dependency graph, which is mapped to an execution plan using fixpoint rule evaluation for strongly connected components R1 R2 R3 R4 R5 R1 R2 R3 R4 R5 sequence ( parallel ( R1, sequence ( R2, fixpoint ( parallel ( R3, R4 ) ) ) ), R5 ) Rule / Table Dependencies Rule Dependencies Execution Plan Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 28/34
  • 34. VKG – Example of Ontology & Mappings over the VDB Ontology schema:Accommodation a owl:Class ; rdfs:subClassOf schema:Place ; rdfs:label "Accommodation"@en ; ... schema:name a owl:DatatypeProperty ; ... hive:Match a owl:Class ... Current ontology formalism (OWL 2 QL) reused as is, but now also models data from APIs Mappings mappingId Semantic Search target data:match/accommodation/{id}/{query} a hive:Match; hive:query {query}^^xsd:string; hive:resource data:accommodation/{id}; hive:excerpt {excerpt}@en; hive:score {score}^^xsd:decimal. source SELECT * FROM hiveodh.srv.vt_semsearch_match Current VKG mapping formalism reused as is, but data may now come from API virtual tables Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 29/34
  • 35. VKG – Query Rewriting & Evaluation Example User-supplied SPARQL query SELECT ?h ?posLabel ?rating ?pos { [] a hive:Match ; hive:query "horse riding"^^xsd:string ; hive:resource ?h ; hive:excerpt ?excerpt ; hive:score ?score . ?h a schema:LodgingBusiness ; geo:defaultGeometry/geo:asWKT ?pos ; schema:name ?name ; schema:description ?description ; schema:starRating/schema:ratingValue ?rating. FILTER (?rating >= 3 && lang(?name) = 'en' && lang(?description) = 'en') BIND (CONCAT(?name, " <br><br>...", ?excerpt, "...<br><br>", ?description) AS ?posLabel) } ORDER BY DESC(?score) LIMIT 10 SQL query rewritten by Ontop SELECT v1.id, v1.excerpt, -- fields used v2."AccoDetail-en-Name", -- for deriving v2."AccoDetail-en-Longdesc", -- ?posLabel ... complex expression computing rating ..., ST_ASTEXT(v2."Geometry") FROM hiveodh.srv.vt_semsearch_match v1, hiveodh.db.v_accommodationsopen v2 WHERE v1."id" = v2."Id" AND CAST(v1."query" AS TEXT) = 'horse riding' AND ... complex condition on rating >= 3 ... AND ... nonnull conditions for output columns ... ORDER BY CAST(v1."score" AS DECIMAL) DESC LIMIT 10 SQL query evaluated on the VDB by Teiid Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 30/34
  • 36. VKG – ODH with Semantic Search Demo Data sources DB with ODH tourism data + Semantic search API to index & query accommodations texts System Ontop embedding Teiid + materialization engine Demo https://hive.inf.unibz.it/ odh/vkg/ reformulate example Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 31/34
  • 37. Overall Framework for VKGs over APIs Virtual DB (VDB) Teiid + service translator VKG Mappings including virtual tables, used for query rewriting Materialization Rules pre-compute results of expensive API calls → VDB/VKG no more fully “virtual” API Bindings define how to query/update a virtual table via API calls, if possible → limited access patterns RDB Sources API Sources Virtual Knowledge Graph (VKG) Ontop SQL SQL calls Application (VKG-based) Application (VDB-based) SQL SPARQL VKG Ontology formalizes the classes/properties (the “schema”) of the VKG, enabling reasoning Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 32/34
  • 38. 1. Introduction 2. The VKG Framework 3. The Ontop VKG System 4. VKGs over Web APIs 5. Conclusions
  • 39. Takeaway Messages Virtual Knowledge Graphs (VKG): flexible technology for building KGs over existing data source(s) • useful for inherently relational data where a VKG engine + RDBMS may outperform a triplestore • useful for existing data RDF-ification via VKG materialization to an RDF file Ontop: mature, open-source VKG system with a solid user & developer community • allows a VKG over a single RDB, with support for multiple database engines • allows a VKG over multiple heterogeneous sources, in combination with an intermediate data federation system such as the open-source Teiid & Dremio • active research & development for adding new features and new data sources VKGs over Web APIs: ongoing research & development effort • enables transparent access to dynamically-computed API data via declarative queries • API operations mapped to virtual relations, accessed through a Teiid extension • optimizations for better using API features, such as bulk operations and operators’ push-down • expensive API operations supported via pre-computation and data materialization Towards Virtual Knowledge Graphs over Web APIs – slides available at https://bit.ly/3WOoldB 33/34
  • 40. Thanks for attending! these slides: https://bit.ly/3WOoldB Ontop: https://ontop-vkg.org/