SlideShare a Scribd company logo
1 of 40
Download to read offline
Data integration with a façade
The case of knowledge graph construction
Enrico Daga
Research Fellow, — Knowledge Media Institute, STEM Faculty, The Open University
www.enridaga.net @enridaga
KMi Seminar Series, 13/7/2022
This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement GA101004746.
The communication re
fl
ects only the author’s view and the Research Executive Agency is not
responsible for any use that may be made of the information it contains.
Definition: a KG …
(i) mainly describes real world entities and their
interrelations, organized in a graph,
(ii) defines possible classes and relations of entities in
a schema
(iii) allows for potentially interrelating arbitrary
entities with each other
(iv) covers various topical domains. [Paulheim, 2017]
Knowledge Graphs
# RDF triples
dbr:Wolfang_Amadeus_Mozart rdfs:label “Wolfgang Amadeus Mozart”@de .
dbr:Wolfang_Amadeus_Mozart dbo:deathPlace dbr:Vienna .
dbr:Vienna dbo:populationTotal “1852997”^^xsd:nonNegativeInteger .
# SPARQL allows us to query the graph via pattern matching
?person rdfs:label ?label .
?person dbo:deathPlace ?place .
?place dbo:populationTotal ?population .
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Example of a SELECT query, retrieving 2 variables
SELECT ?person ?label
WHERE {
#This is our graph pattern
?person rdfs:label ?label ;
dbo:deathPlace dbr:Vienna
}
PREFIX schema: <http://schema.org/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Example of a CONSTRUCT query, which generates a new KG
CONSTRUCT {
?person rdf:type schema:Person ;
rdfs:label ?label ;
schema:deathPlace ?deathPlace
} WHERE {
?person rdfs:label ?label ;
dbo:deathPlace ?deathPlace
}
Preliminaries
Data Integration is the dominant use case for Knowledge Graphs - [Atkin, 2021, in
Lassila et al, 2021]
Definition: “Data integration involves combining data residing in different sources and
providing users with a unified view of them” [Lenzerini, 2002]
Data integration systems are defined as (G, S, M)
- G is the target common (global) schema
- S is the heterogeneous set of source schemas
- M are the mapping that maps queries between the
source and the global schemas
CSV
XML
JSON
XPath
IRI
JsonPath
HTML
JsonSchema
DOM
Web APIs HTTP
XSLT
XSD
Excel
Knowledge Graph
Construction (KGC)
Playing the soundtrack of our history
Preserving musical heritage
through knowledge graphs
Managing musical heritage collections
through knowledge graphs
Studying musical heritage through
(interlinked) knowledge graphs
https://spice-h2020.eu/ https://polifonia-project.eu/
“Source” can be … anything!
Data integration theory assumes that data sources are format-compatible,
i.e. it does not account for syntactic heterogeneity (it abstracts from the
actual formats or assumes the same format, e.g. a relational database)
HTML
CSV
JSON
“Source” can be … anything!
BibTex
abc
Markdown
XML/MEI
Existing approaches
• Targeting specific types/formats: Direct mapping, Tarql,
Any23, JSON2RDF, CSV2RDF, COW, SPARQL Micro-
services [Michel, 2019] — lack generality
• Specialised mapping languages, several types of
(R2RML, RML, ShexML): high learning demands, cold
start problem, limited to few popular format, difficult to
extend. [Dimou, 2014] [García-González, 2020]
• Extending SPARQL with custom features: SPARQL
Generate, high learning demands, difficult to extend to
other formats. [Lefrançois, 2017]
• Cognitive complexity: solutions transfer data source
complexity to the user (e.g. need to know XPath for XML,
JsonPath for JSON, …) or ask the user to learn a new
language!
SPARQL
Generate
RML
Instead, our aim is
• Target an open ended set of formats (extendibility without changes to
user-facing tool)
• Reduce the learning curve for Knowledge Graph practitioners (including
non-developers from Healthcare, Cultural heritage, …)
• Many SPARQL users fall into the category of end-user developer
[Lieberman, 2006]
• In a recent survey [Warren et al, 2018] 42% KG/SPARQL users are from
non-IT areas, including social sciences and the humanities, business
and economics, and biomedical, engineering or physical sciences.
KGC
Iterative process:
• Observe: the resource (e.g. a CSV file)
• Design mappings to a target ontology
• Transform: execute the mappings
• Observe: compare / evaluate
Trail and error approach, many iterations
KGC, revisited
KG construction is a twofold job:
• perform a syntax/meta-model conversion (e.g. CSV to RDF)
• project semantics onto the data (applying a domain
ontology)
A better model of the user process:
• Observe: the resource (e.g. a CSV file)
• Reengineering Design: what syntax/meta-model do I want?
• Remodelling Design: what semantics we want?
• Transform: execute the mappings
• Observe: compare / evaluate
Trail and error approach, many iterations
KGC, opinionated
• Reengineering: what syntax/meta-model do we want?
• We cannot know what structure our user wants but we
know the meta-model: RDF
• Remodelling: what semantics do we project?
• SPARQL is great for projecting semantics (change
namespaces, create entities from literals, adding types,
sophisticated relationships, composite structures, …)
• Data Integration theory here!
How to develop a unified approach to re-
engineer an open-ended set of formats
into RDF?
Concept
Façade Design Pattern
From Object Oriented Programming
A single abstraction on different, alternative interfaces
https://en.wikipedia.org/wiki/Facade_pattern
An RDF façade?
• A common RDF structure over diverse
formats
• Focusing on the meta-model (data structure)
• Leaving domain semantics as-it-is!
• apply the least possible “ontological
commitment”
• Problem Space: CSV, JSON, HTML, XML,
Binary (JPEG, PNG, …),Text
• Solution space: RDFS
1
2
3
4
1
n
A simple intuition:
Data formats rely on a common set of primitives …
BibTex
abc
Markdown
XML/MEI
Facade X
A simplified RDF meta-model, resembling a list-of-lists
Components: Containers (typed), slots (string / int), values
Intuitive, abstract notions: key-value, sequence, type
“String”, 1, true,…
“String”, 1, true
xyz:row1, …
xyz:row_n, xyz:…
fx:root | xyz:*
rdf:type
xyz:*
rdf:_N
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
JSON
https://github.com/tategallery/collection/artworks/t/023/t02319-9205.json
[ a fx:root ;
xyz:acno "T02319" ;
xyz:acquisitionYear “1978"^^xsd:int ;
xyz:all_artists "Kazimir Malevich" ;
xyz:catalogueGroup […] ;
xyz:classification "painting" ;
xyz:contributorCount “1”^^xsd:int
…
{
"acno": "T02319",
"acquisitionYear": 1978,
"all_artists": "Kazimir Malevich",
"catalogueGroup": {},
"classification": "painting",
"contributorCount": 1,
"contributors": [
{
DOM (HTML, XML, …)
[ a fx:root , xhtml:div ;
xhtml:id “az-group” ;
rdf:_1 [ a xhtml:div ;
rdf:_1 [ a xhtml:h4 ;
rdf:_1 "A" ;
<https://html.spec.whatwg.org/#innerHTML>
"A" ;
<https://html.spec.whatwg.org/#innerText>
"A"
] ;
html.selector=#az-group
@prefix xhtml: <http://www.w3.org/1999/xhtml#> .
CSV
id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url
10093,"Abakanowicz, Magdalena",Female,born 1930,1930,,Polska,,http://www.tate.org.uk/art/artists/magdalena-abakanowicz-10093
…
https://github.com/tategallery/collection/blob/master/artist_data.csv
[ a fx:root ;
rdf:_1 [ xyz:dates "born 1930" ;
xyz:gender "Female" ;
xyz:id "10093" ;
xyz:name "Abakanowicz, Magdalena" ;
xyz:placeOfBirth "Polska" ;
xyz:placeOfDeath "" ;
xyz:url "http://www.tate.org.uk/art/artists/magdalena-
abakanowicz-10093" ;
xyz:yearOfBirth "1930" ;
xyz:yearOfDeath ""
] ;
csv.headers=true|false
[ a fx:root ;
rdf:_1 [ rdf:_1 "id" ;
rdf:_2 "name" ;
rdf:_3 "gender" ;
rdf:_4 "dates" ;
rdf:_5 "yearOfBirth" ;
rdf:_6 "yearOfDeath" ;
rdf:_7 "placeOfBirth" ;
rdf:_8 “placeOfDeath" ;
rdf:_9 "url"
] ;
BibTex
ABC notation
[ a fx:root ;
xyz:X “1” ;
xyz:T “Ah! vous …” ;
xyz:C “anon.” ; xyz:O “France” ;
xyz:Z: “Transcirption based …”
rdf:_1 [
rdf:_1 “|: ” ;
rdf:_2 [
rdf:_1 “C” ;
rdf:_2 “C” ;
rdf:_3 “G” ;
rdf:_4 “G” ] ;
rdf:_3 “ | “ ;
rdf:_4 [
rdf:_1 “C” ;
rdf:_2 “C” ;
rdf:_3 “G2” ] ;
]
(3) Map on target schema
(1) Source schema
(2) Transform …
Mappings in SPARQL 1.1 !!!
Features v0.7.0
sparql-anything.cc
https://sparql-anything.readthedocs.io/
https://
github.com/
SPARQL-
Anything/
showcase-tate
Tate Gallery Open
Data
* CSV listing
artworks
* JSON with details
Task: build a SKOS
taxonomy of
@enridaga
https://github.com/
SPARQL-Anything/
showcase-imma
https://imma.ie/collection/freeing-the-memory/
Polifonia Ontology Network
* Scenarios collected on GitHub as
Markdown files
Task: extract a list of competency
questions from any scenario
https://github.com/SPARQL-Anything/showcase-mei
XML->CSV: using SPARQL to extract the note sequence from a XML/MEI file
System
SPARQL Anything is implemented on top of a
standard SPARQL 1.1 query engine (Apache
Jena ARQ)
Query execution is always after view
materialisation
In-memory / On-disk options available
Performance can be improved with
optimisations: e.g. avoid materialising triples
not needed by the query (triple-filtering)
Method can scale further by Slicing the data,
instead of materialising a Complete view
Evaluation
How to develop a unified approach to re-engineer an open-ended set of
formats into RDF?
• Empirical: all formats scrutinised so far are representable with FX
• Theoretical 1/2: any format expressible with a BNF grammar can be also
represented as FX
• Theoretical 2/2: FX can also represent the Entity-Relation model
• Approach scales linearly with input size
Lower (cognitive) complexity
• One measure of complexity is the number of (distinct) items or
variables (Halford et al. 2004; Warren et al. 2015).
• Vs SPARQL Generate: 8 queries from a museum collection use
case
• What are the titles of the artworks attributed to “ANONIMO”?
• What are the titles of the artworks created in the 1935?
• …
• vs RML / SPARQL Generate / ShexML
• 4 mappings
• Avg distinct tokens:
• SPARQL Anything: ~18
• SPARQL Generate: ~25 (∼39.72% more)
• RML: ~45 (∼150% more)
Comparisons
Practicable and sustainable
• Quantitative analysis of performance, to assess practicability
• In-Memory implementation (Naive)
• Execution time of q1-q12 (AVG on 10 executions)
• Comparable to RML Mapper and SPARQL Generate on files
up to 1M JSON objects (~5M triples)
• In-Memory implementation scales linearly
• The approach is sustainable
• Lines of Java code to maintain: SPARQL Generate 12280
(core module); RML Mapper 7951; SPARQL Anything: 3842
(all transformers) — v0.2.0 (v0.7.0 has ~12k)
Comparisons
SEMANTICS conference 2021 Submitted to ACM Transactions on Internet Technologies
Benefits
• Transform / Query resources having heterogeneous formats
• Low learning demands for KG practitioners — plain SPARQL 1.1
• A single+consistent abstraction for potentially any data format
• “Free lunch” data exploration and querying (no cold start problem)
• Open-ended extendibility: no changes to user-facing code required
• FX can express ANY format representable as BNF (as well as relational data)
• Generate RDF (and RDF-Star) but also Tabular Data
• Sometimes the KG is the intermediate object. Applications requiring other formats —
e.g. tabular data is the usual format for data science, KG can be a feature engineering
medium
Challenges / Directions
Execute FX queries
• FX view materialisation only method supported so far.
• Study other execution strategies, e.g. query-rewriting, learn from Ontology Based Data Access (OBDA)
Support users
• Only SPARQL raw code so far: develop user interfaces for FX-based data integration / exploration?
• Explore mapping design patterns
Support developers: how to streamline developing adapters to new formats?
Features:
• More inputs: relational DB, MIDI, Image Annotations, … the sky is the limit!
• More connectors: Apache Parquet, Thrift, Hadoop/HDFS, RDBMS/JDBC, MongoDB, …
• Anything in … anything out! Equip SPARQL Anything with a full-fledged template engine
Luigi
Asprino
University of
Bologna
Enrico
Daga
The Open
University
Aldo
Gangemi
ISTC-CNR
Justin
Dowdy
Software
Engineer
Paul
Warren
The Open
University
Paul
Mulholland
The Open
University
This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement GA101004746.
The communication re
fl
ects only the author’s view and the Research Executive Agency is not
responsible for any use that may be made of the information it contains.
Credits
Marco
Ratta
The Open
University
Jason
Carvalho
The Open
University
Thank you for listening
www.enridaga.net
• Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything. In: SEMANTiCS 2021: 17th International Conference on Semantic Systems
(2021)
• Atkin, M., Deely, T., Scharffe, F.: Knowledge Graph Benchmarking Report 2021 (version 2.0). Zenodo, http://doi.org/10.5281/zenodo.4950097 (June 2021)
• Lassila, O., Michael Schmidt, Brad Bebee, Dave Bechberger, Willem Broekema, Ankesh Khandelwal, Kelvin Lawrence, Ronak Sharda, and Bryan Thompson: Graph? Yes! Which
one? Help!. 1st Squaring the circle on knowledge graphs workshop - Semantics (2021)
• Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021)
• Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer (2018)
• Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020)
• Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data standards. In: Companion Proceedings of
The 2019 World Wide Web Conference. pp. 883–892 (2019)
• Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings of heterogeneous data. In: 7th
Workshop on Linked Data on the Web (2014)
• García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data mapping languages for firsttime users. PeerJ
Computer Science 6, e318 (2020)
• Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8, no. 3 (2017): 489-508.
• Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state of the art in enduser software
engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011)
• Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web Conference. pp. 35– 50. Springer
(2017)
• Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer (2006)
• Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015.
• Lenzerini, Maurizio. "Data integration: A theoretical perspective." In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,
pp. 233-246. 2002.
References

More Related Content

Similar to Data integration with a façade. The case of knowledge graph construction.

Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Dan Brickley
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Ralf Stockmann
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataeswcsummerschool
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic webWorawith Sangkatip
 
OA - Shared Canvas - TEI - Biblissima project
OA - Shared Canvas - TEI - Biblissima projectOA - Shared Canvas - TEI - Biblissima project
OA - Shared Canvas - TEI - Biblissima projectEquipex Biblissima
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information ArchitectureScott Abel
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsSpeck&Tech
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013Mariana Damova, Ph.D
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE
 
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatPretaLLOD
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Antoine Isaac
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)Vladimir Alexiev, PhD, PMP
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesAlessandro Adamou
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryRuben Schalk
 

Similar to Data integration with a façade. The case of knowledge graph construction. (20)

Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001
 
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic web
 
OA - Shared Canvas - TEI - Biblissima project
OA - Shared Canvas - TEI - Biblissima projectOA - Shared Canvas - TEI - Biblissima project
OA - Shared Canvas - TEI - Biblissima project
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIs
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
 
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 

More from Enrico Daga

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...Enrico Daga
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchEnrico Daga
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEIEnrico Daga
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Enrico Daga
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchEnrico Daga
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Enrico Daga
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterEnrico Daga
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesEnrico Daga
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User StudyEnrico Daga
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsEnrico Daga
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionEnrico Daga
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
 

More from Enrico Daga (17)

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities research
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selection
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Data integration with a façade. The case of knowledge graph construction.

  • 1. Data integration with a façade The case of knowledge graph construction Enrico Daga Research Fellow, — Knowledge Media Institute, STEM Faculty, The Open University www.enridaga.net @enridaga KMi Seminar Series, 13/7/2022 This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement GA101004746. The communication re fl ects only the author’s view and the Research Executive Agency is not responsible for any use that may be made of the information it contains.
  • 2. Definition: a KG … (i) mainly describes real world entities and their interrelations, organized in a graph, (ii) defines possible classes and relations of entities in a schema (iii) allows for potentially interrelating arbitrary entities with each other (iv) covers various topical domains. [Paulheim, 2017] Knowledge Graphs
  • 3.
  • 4. # RDF triples dbr:Wolfang_Amadeus_Mozart rdfs:label “Wolfgang Amadeus Mozart”@de . dbr:Wolfang_Amadeus_Mozart dbo:deathPlace dbr:Vienna . dbr:Vienna dbo:populationTotal “1852997”^^xsd:nonNegativeInteger . # SPARQL allows us to query the graph via pattern matching ?person rdfs:label ?label . ?person dbo:deathPlace ?place . ?place dbo:populationTotal ?population .
  • 5. PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX dbr: <http://dbpedia.org/resource/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> # Example of a SELECT query, retrieving 2 variables SELECT ?person ?label WHERE { #This is our graph pattern ?person rdfs:label ?label ; dbo:deathPlace dbr:Vienna }
  • 6. PREFIX schema: <http://schema.org/> PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX dbr: <http://dbpedia.org/resource/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> # Example of a CONSTRUCT query, which generates a new KG CONSTRUCT { ?person rdf:type schema:Person ; rdfs:label ?label ; schema:deathPlace ?deathPlace } WHERE { ?person rdfs:label ?label ; dbo:deathPlace ?deathPlace }
  • 7. Preliminaries Data Integration is the dominant use case for Knowledge Graphs - [Atkin, 2021, in Lassila et al, 2021] Definition: “Data integration involves combining data residing in different sources and providing users with a unified view of them” [Lenzerini, 2002] Data integration systems are defined as (G, S, M) - G is the target common (global) schema - S is the heterogeneous set of source schemas - M are the mapping that maps queries between the source and the global schemas
  • 9. Playing the soundtrack of our history Preserving musical heritage through knowledge graphs Managing musical heritage collections through knowledge graphs Studying musical heritage through (interlinked) knowledge graphs https://spice-h2020.eu/ https://polifonia-project.eu/
  • 10. “Source” can be … anything! Data integration theory assumes that data sources are format-compatible, i.e. it does not account for syntactic heterogeneity (it abstracts from the actual formats or assumes the same format, e.g. a relational database) HTML CSV JSON
  • 11. “Source” can be … anything! BibTex abc Markdown XML/MEI
  • 12. Existing approaches • Targeting specific types/formats: Direct mapping, Tarql, Any23, JSON2RDF, CSV2RDF, COW, SPARQL Micro- services [Michel, 2019] — lack generality • Specialised mapping languages, several types of (R2RML, RML, ShexML): high learning demands, cold start problem, limited to few popular format, difficult to extend. [Dimou, 2014] [García-González, 2020] • Extending SPARQL with custom features: SPARQL Generate, high learning demands, difficult to extend to other formats. [Lefrançois, 2017] • Cognitive complexity: solutions transfer data source complexity to the user (e.g. need to know XPath for XML, JsonPath for JSON, …) or ask the user to learn a new language! SPARQL Generate RML
  • 13. Instead, our aim is • Target an open ended set of formats (extendibility without changes to user-facing tool) • Reduce the learning curve for Knowledge Graph practitioners (including non-developers from Healthcare, Cultural heritage, …) • Many SPARQL users fall into the category of end-user developer [Lieberman, 2006] • In a recent survey [Warren et al, 2018] 42% KG/SPARQL users are from non-IT areas, including social sciences and the humanities, business and economics, and biomedical, engineering or physical sciences.
  • 14. KGC Iterative process: • Observe: the resource (e.g. a CSV file) • Design mappings to a target ontology • Transform: execute the mappings • Observe: compare / evaluate Trail and error approach, many iterations
  • 15. KGC, revisited KG construction is a twofold job: • perform a syntax/meta-model conversion (e.g. CSV to RDF) • project semantics onto the data (applying a domain ontology) A better model of the user process: • Observe: the resource (e.g. a CSV file) • Reengineering Design: what syntax/meta-model do I want? • Remodelling Design: what semantics we want? • Transform: execute the mappings • Observe: compare / evaluate Trail and error approach, many iterations
  • 16. KGC, opinionated • Reengineering: what syntax/meta-model do we want? • We cannot know what structure our user wants but we know the meta-model: RDF • Remodelling: what semantics do we project? • SPARQL is great for projecting semantics (change namespaces, create entities from literals, adding types, sophisticated relationships, composite structures, …) • Data Integration theory here! How to develop a unified approach to re- engineer an open-ended set of formats into RDF?
  • 17. Concept Façade Design Pattern From Object Oriented Programming A single abstraction on different, alternative interfaces https://en.wikipedia.org/wiki/Facade_pattern An RDF façade? • A common RDF structure over diverse formats • Focusing on the meta-model (data structure) • Leaving domain semantics as-it-is! • apply the least possible “ontological commitment” • Problem Space: CSV, JSON, HTML, XML, Binary (JPEG, PNG, …),Text • Solution space: RDFS
  • 18. 1 2 3 4 1 n A simple intuition: Data formats rely on a common set of primitives … BibTex abc Markdown XML/MEI
  • 19. Facade X A simplified RDF meta-model, resembling a list-of-lists Components: Containers (typed), slots (string / int), values Intuitive, abstract notions: key-value, sequence, type “String”, 1, true,… “String”, 1, true xyz:row1, … xyz:row_n, xyz:… fx:root | xyz:* rdf:type xyz:* rdf:_N PREFIX fx: <http://sparql.xyz/facade-x/ns/> PREFIX xyz: <http://sparql.xyz/facade-x/data/>
  • 20. JSON https://github.com/tategallery/collection/artworks/t/023/t02319-9205.json [ a fx:root ; xyz:acno "T02319" ; xyz:acquisitionYear “1978"^^xsd:int ; xyz:all_artists "Kazimir Malevich" ; xyz:catalogueGroup […] ; xyz:classification "painting" ; xyz:contributorCount “1”^^xsd:int … { "acno": "T02319", "acquisitionYear": 1978, "all_artists": "Kazimir Malevich", "catalogueGroup": {}, "classification": "painting", "contributorCount": 1, "contributors": [ {
  • 21. DOM (HTML, XML, …) [ a fx:root , xhtml:div ; xhtml:id “az-group” ; rdf:_1 [ a xhtml:div ; rdf:_1 [ a xhtml:h4 ; rdf:_1 "A" ; <https://html.spec.whatwg.org/#innerHTML> "A" ; <https://html.spec.whatwg.org/#innerText> "A" ] ; html.selector=#az-group @prefix xhtml: <http://www.w3.org/1999/xhtml#> .
  • 22. CSV id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url 10093,"Abakanowicz, Magdalena",Female,born 1930,1930,,Polska,,http://www.tate.org.uk/art/artists/magdalena-abakanowicz-10093 … https://github.com/tategallery/collection/blob/master/artist_data.csv [ a fx:root ; rdf:_1 [ xyz:dates "born 1930" ; xyz:gender "Female" ; xyz:id "10093" ; xyz:name "Abakanowicz, Magdalena" ; xyz:placeOfBirth "Polska" ; xyz:placeOfDeath "" ; xyz:url "http://www.tate.org.uk/art/artists/magdalena- abakanowicz-10093" ; xyz:yearOfBirth "1930" ; xyz:yearOfDeath "" ] ; csv.headers=true|false [ a fx:root ; rdf:_1 [ rdf:_1 "id" ; rdf:_2 "name" ; rdf:_3 "gender" ; rdf:_4 "dates" ; rdf:_5 "yearOfBirth" ; rdf:_6 "yearOfDeath" ; rdf:_7 "placeOfBirth" ; rdf:_8 “placeOfDeath" ; rdf:_9 "url" ] ;
  • 24. ABC notation [ a fx:root ; xyz:X “1” ; xyz:T “Ah! vous …” ; xyz:C “anon.” ; xyz:O “France” ; xyz:Z: “Transcirption based …” rdf:_1 [ rdf:_1 “|: ” ; rdf:_2 [ rdf:_1 “C” ; rdf:_2 “C” ; rdf:_3 “G” ; rdf:_4 “G” ] ; rdf:_3 “ | “ ; rdf:_4 [ rdf:_1 “C” ; rdf:_2 “C” ; rdf:_3 “G2” ] ; ]
  • 25. (3) Map on target schema (1) Source schema (2) Transform … Mappings in SPARQL 1.1 !!!
  • 27. https:// github.com/ SPARQL- Anything/ showcase-tate Tate Gallery Open Data * CSV listing artworks * JSON with details Task: build a SKOS taxonomy of @enridaga
  • 29. Polifonia Ontology Network * Scenarios collected on GitHub as Markdown files Task: extract a list of competency questions from any scenario
  • 30. https://github.com/SPARQL-Anything/showcase-mei XML->CSV: using SPARQL to extract the note sequence from a XML/MEI file
  • 31. System SPARQL Anything is implemented on top of a standard SPARQL 1.1 query engine (Apache Jena ARQ) Query execution is always after view materialisation In-memory / On-disk options available Performance can be improved with optimisations: e.g. avoid materialising triples not needed by the query (triple-filtering) Method can scale further by Slicing the data, instead of materialising a Complete view
  • 32. Evaluation How to develop a unified approach to re-engineer an open-ended set of formats into RDF? • Empirical: all formats scrutinised so far are representable with FX • Theoretical 1/2: any format expressible with a BNF grammar can be also represented as FX • Theoretical 2/2: FX can also represent the Entity-Relation model • Approach scales linearly with input size
  • 33. Lower (cognitive) complexity • One measure of complexity is the number of (distinct) items or variables (Halford et al. 2004; Warren et al. 2015). • Vs SPARQL Generate: 8 queries from a museum collection use case • What are the titles of the artworks attributed to “ANONIMO”? • What are the titles of the artworks created in the 1935? • … • vs RML / SPARQL Generate / ShexML • 4 mappings • Avg distinct tokens: • SPARQL Anything: ~18 • SPARQL Generate: ~25 (∼39.72% more) • RML: ~45 (∼150% more) Comparisons
  • 34. Practicable and sustainable • Quantitative analysis of performance, to assess practicability • In-Memory implementation (Naive) • Execution time of q1-q12 (AVG on 10 executions) • Comparable to RML Mapper and SPARQL Generate on files up to 1M JSON objects (~5M triples) • In-Memory implementation scales linearly • The approach is sustainable • Lines of Java code to maintain: SPARQL Generate 12280 (core module); RML Mapper 7951; SPARQL Anything: 3842 (all transformers) — v0.2.0 (v0.7.0 has ~12k) Comparisons
  • 35. SEMANTICS conference 2021 Submitted to ACM Transactions on Internet Technologies
  • 36. Benefits • Transform / Query resources having heterogeneous formats • Low learning demands for KG practitioners — plain SPARQL 1.1 • A single+consistent abstraction for potentially any data format • “Free lunch” data exploration and querying (no cold start problem) • Open-ended extendibility: no changes to user-facing code required • FX can express ANY format representable as BNF (as well as relational data) • Generate RDF (and RDF-Star) but also Tabular Data • Sometimes the KG is the intermediate object. Applications requiring other formats — e.g. tabular data is the usual format for data science, KG can be a feature engineering medium
  • 37. Challenges / Directions Execute FX queries • FX view materialisation only method supported so far. • Study other execution strategies, e.g. query-rewriting, learn from Ontology Based Data Access (OBDA) Support users • Only SPARQL raw code so far: develop user interfaces for FX-based data integration / exploration? • Explore mapping design patterns Support developers: how to streamline developing adapters to new formats? Features: • More inputs: relational DB, MIDI, Image Annotations, … the sky is the limit! • More connectors: Apache Parquet, Thrift, Hadoop/HDFS, RDBMS/JDBC, MongoDB, … • Anything in … anything out! Equip SPARQL Anything with a full-fledged template engine
  • 38. Luigi Asprino University of Bologna Enrico Daga The Open University Aldo Gangemi ISTC-CNR Justin Dowdy Software Engineer Paul Warren The Open University Paul Mulholland The Open University This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement GA101004746. The communication re fl ects only the author’s view and the Research Executive Agency is not responsible for any use that may be made of the information it contains. Credits Marco Ratta The Open University Jason Carvalho The Open University
  • 39. Thank you for listening www.enridaga.net
  • 40. • Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything. In: SEMANTiCS 2021: 17th International Conference on Semantic Systems (2021) • Atkin, M., Deely, T., Scharffe, F.: Knowledge Graph Benchmarking Report 2021 (version 2.0). Zenodo, http://doi.org/10.5281/zenodo.4950097 (June 2021) • Lassila, O., Michael Schmidt, Brad Bebee, Dave Bechberger, Willem Broekema, Ankesh Khandelwal, Kelvin Lawrence, Ronak Sharda, and Bryan Thompson: Graph? Yes! Which one? Help!. 1st Squaring the circle on knowledge graphs workshop - Semantics (2021) • Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021) • Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer (2018) • Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020) • Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data standards. In: Companion Proceedings of The 2019 World Wide Web Conference. pp. 883–892 (2019) • Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings of heterogeneous data. In: 7th Workshop on Linked Data on the Web (2014) • García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data mapping languages for firsttime users. PeerJ Computer Science 6, e318 (2020) • Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8, no. 3 (2017): 489-508. • Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state of the art in enduser software engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011) • Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web Conference. pp. 35– 50. Springer (2017) • Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer (2006) • Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015. • Lenzerini, Maurizio. "Data integration: A theoretical perspective." In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 233-246. 2002. References