Slides of the presentation at #ENDORSE2023
The SPARQL Anything project: http://sparql-anything.cc
Endorse Conference 2023, see
https://twitter.com/EULawDataPubs/status/1635663471349223425
--
Abstract:
What should a data integration framework for knowledge graph experts look like?
Approaches can transform the non-RDF data sources by applying ad-hoc transformations to existing ontologies (Any23), using a mapping language (RML) or expanding on existing standards with custom operators (SPARQL Generate). These solutions result either in code that is difficult to maintain and reuse or require KG experts to learn a variety of languages and custom tools. Recent research on Knowledge Graph construction proposes the design of a façade, a notion borrowed from object-oriented software engineering. This idea is applied to SPARQL Anything, a system that allows querying heterogeneous resources as if they were in RDF, in standard SPARQL 1.1.
The SPARQL Anything project supports a wide variety of file formats, from popular ones (CSV, JSON, XML, Spreadsheets) to others that are not supported by alternative solutions (Markdown, YAML, DOCx, Bibtex). Features include querying Web APIs with high flexibility, parametrized queries, and chaining multiple transformations into complex pipelines.
We describe the design rationale of the SPARQL Anything system and its application in two EU-funded projects and in the industry. We provide references to an extensive set of reusable showcases. We report on the value-to-users of the founding assumptions of SPARQL Anything, compared to alternative solutions to knowledge graph construction.
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything project
1. Enrico Daga
The Open University www.enridaga.net
Streamlining Knowledge Graph Construction with a façade:
The SPARQL Anything project
Enrico Daga, Luigi Asprino, Justin Dowdy, Paul Mulholland and Aldo Gangemi
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement GA101004746.
The communication reflects only the author’s view and the Research Executive Agency is not responsible for any use that may be made of the information it contains.
3. Playing the soundtrack of our history
Preserving musical heritage
through knowledge graphs
Managing musical heritage collections
through knowledge graphs
Studying musical heritage through
(interlinked) knowledge graphs
https://spice-h2020.eu/ https://polifonia-project.eu/
4. Approaches
• Targeting specific types/formats: Direct mapping, Tarql,
Any23, JSON2RDF, CSV2RDF, COW, SPARQL Micro-services
[Michel, 2019] — lack generality
• Specialised mapping languages, several types of (R2RML,
RML, ShexML): high learning demands, cold start
problem, limited to few popular format, difficult to
extend. [Dimou, 2014] [García-González, 2020]
• Extending SPARQL with custom features: SPARQL
Generate, high learning demands, difficult to extend to
other formats. [Lefrançois, 2017]
• These solutions transfer data source complexity to the
user (e.g. need to know XPath for XML, JsonPath for
JSON, …) or ask the user to learn a new language!
The SPARQL Anything project
5. “Source” can be … anything!
The SPARQL Anything Project
Data integration theory assumes that data sources are format-compatible, i.e. it
does not account for syntactic heterogeneity (abstracts from the actual formats)
CSV
JSON
Files
HTML Website
6. “Source” can be … anything!
The SPARQL Anything project
BibTex
Markdown
XML/MEI
Spreadsheets, Web APIs, …
7. A simple intuition
The SPARQL Anything project
Data formats rely on a small set of primitives
8. Façade X
The SPARQL Anything project
A simplified RDF meta-model, resembling a list-of-lists
Components: Containers (typed), slots (string / int),
values
Intuitive, abstract notions: key-value, sequence, type
Daga, Enrico, Luigi Asprino, Paul Mulholland, and Aldo Gangemi.
"Facade-X: an opinionated approach to SPARQL
anything." SEMANTICS - Studies on the Semantic Web 53 (2021)
9. (3) Map on target schema
(1) Source schema
(2) Transform …
Mappings in SPARQL 1.1 !!!
https://sparql-anything.cc/
Command Line
Server (Jena Fuseki)
Java API
Python Library (New!)
11. MusicXML
Feedback: @enridaga
Ratta, Marco, and Enrico Daga. "Knowledge Graph
Construction From MusicXML: An Empirical Investigation
With SPARQL Anything."
Exploring Music Computing tasks:
• Melody extraction
• N-grams Extraction
• N-grams Analysis
• Music Note Ontology Population
https://github.com/SPARQL-
Anything/showcase-musicxml
13. Benefits
• Transform / Query resources having heterogeneous formats
• Low learning demands for KG practitioners — plain SPARQL 1.1
• A single + consistent abstraction for potentially any data format
• “Free lunch” data exploration (no cold start problem)
• Open-ended extendibility: no changes to user-facing code required
• FX can express ANY format representable as BNF (as well as relational data)
• Generate RDF (and RDF-Star) but also Tabular Data
• Sometimes the KG is the intermediate object. Applications requiring other
formats — e.g. tabular data is the usual format for data science, KG can be
a feature engineering medium
The SPARQL Anything project
Asprino, Luigi, Enrico Daga, Aldo Gangemi, and Paul Mulholland. "Knowledge Graph
Construction with a façade: a unified method to access heterogeneous data
sources on the Web." ACM Transactions on Internet Technology (2022).
14. Challenges / Directions
The SPARQL Anything project
Execute FX queries
• FX view materialisation only method supported so far.
• Study other execution strategies, e.g. query-rewriting, learn from Ontology Based Data Access (OBDA)
Support users in query design
• Only SPARQL raw code so far: develop user interfaces for FX-based data integration / exploration?
• Explore mapping design patterns
Support developers: how to streamline developing adapters to new formats?
Features:
• More inputs: relational DB, MIDI, Image Annotations, … the sky is the limit!
• More connectors: Apache Parquet, Thrift, Hadoop/HDFS, RDBMS/JDBC, MongoDB, …
• Anything in … anything out! Equip SPARQL Anything with a full-fledged template engine
15. Luigi
Asprino
University of
Bologna
Enrico
Daga
The Open
University
Aldo
Gangemi
ISTC-CNR
Justin
Dowdy
Software
Engineer
Paul
Warren
The Open
University
Paul
Mulholland
The Open
University
This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement GA101004746.
The communication reflects only the author’s view and the Research Executive Agency is not
responsible for any use that may be made of the information it contains.
The SPARQL Anything Project
Credits
Marco
Ratta
The Open
University
Jason
Carvalho
The Open
University
17. References
• Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything. In: SEMANTiCS 2021: 17th International Conference
on Semantic Systems (2021)
• Asprino, Luigi, Enrico Daga, Aldo Gangemi, and Paul Mulholland. "Knowledge Graph Construction with a façade: a unified method to access
heterogeneous data sources on the Web." ACM Transactions on Internet Technology (2022).
• Ratta, Marco, and Enrico Daga. "Knowledge Graph Construction From MusicXML: An Empirical Investigation With SPARQL Anything."
• Atkin, M., Deely, T., Scharffe, F.: Knowledge Graph Benchmarking Report 2021 (version 2.0). Zenodo, http://doi.org/10.5281/zenodo.4950097 (June 2021)
• Lassila, O., Michael Schmidt, Brad Bebee, Dave Bechberger, Willem Broekema, Ankesh Khandelwal, Kelvin Lawrence, Ronak Sharda, and Bryan Thompson: Graph?
Yes! Which one? Help!. 1st Squaring the circle on knowledge graphs workshop - Semantics (2021)
• Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021)
• Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer (2018)
• Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020)
• Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data standards. In: Companion
Proceedings of The 2019 World Wide Web Conference. pp. 883–892 (2019)
• Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings of heterogeneous data.
In: 7th Workshop on Linked Data on the Web (2014)
• García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data mapping languages for
firsttime users. PeerJ Computer Science 6, e318 (2020)
• Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8, no. 3 (2017): 489-508.
• Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state of the art in enduser
software engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011)
• Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web Conference. pp. 35–
50. Springer (2017)
• Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer (2006)
• Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015.
• Lenzerini, Maurizio. "Data integration: A theoretical perspective." In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of
database systems, pp. 233-246. 2002.
The SPARQL Anything Project