SlideShare a Scribd company logo
1 of 31
Download to read offline
Trying Not to Die Benchmarking
using LITMUS
Harsh Thakkar1
, Yashwant Keswani2
, Mohnish Dubey1
,
Jens Lehmann1,3
, Sören Auer4
1
University of Bonn, Bonn, Germany
2
DA-IICT, Gandhinagar, India
3
Fraunhofer IAIS, St. Augustin, Germany
4
TIB, Hannover, Germany
- Amsterdam - Nederland - September 13
2Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Outline
● Motivation
● Problem Statement
● State of the Art
● Approach - LITMUS Benchmark Suite
● Challenges
● Evaluation Plan
● Next Steps
3Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
RDF-3X
Ocean of Data
Sea of Tools+
K-V stores
Graph stores
Doc-oriented
stores
RDF stores
Wide column
stores
Real
Synthetic
http://lod-cloud.net/versions/2017-02-20/lod.pn
g
LOD Cloud 2017
Motivation
2
4Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
• Domain specific
applications:
i.e. perspectives
• Choice Overload!
• Vendors
• Researchers
• Users
https://steemit.com/philosophy/@l0k1/subjectivity-and-truth-how-blockchains-model-consensus-building
Motivation
5Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Benchmarking
● Tedious!
● Needs domain-specific expertise
● Lack of standardization (single focus)
○ Open software, System configuration
settings, etc.
● Near-zero Reusability
● Guaranteeing a fair benchmark is difficult!
● Choosing the right performance metrics is
cumbersome and subjective
● Visualising benchmark results
[6] http://2.bp.blogspot.com/-TkUb0TPN7IA/VewUHm_jVaI/AAAAAAAABgM/vZILnZNJv5A/s1600/2012-10-16-subjective-objective.jpg
6Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Problem Statement
“How can diverse cross-domain DMSs
be benchmarked in an automated
established *
standard #
environment?”
7Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
State of the Art
Benchmark Effort Relational DMSs RDF DMSs Graph DMSs
TPC [H,C,E,DS] [13]
XGDBench [6]
HPC [7]
Graph 500 [12]
DBPSB [11]
LUBM [9]
IGUANA [19]
WatDiv [1]
SP2Bench [20]
BSBM [4]
Pandora*
Graphium [8]
LDBC [2]
HOBBIT**
*http://pandora.ldc.usb.ve/
Single domain
Benchmarks
Cross domain
Benchmarks
**https://project-hobbit.eu/
8Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
LITMUS Benchmark Suite
9Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Dataset 1 Dataset 2 Dataset 3 Dataset N
Data integration module
Benchmarking Core
Controller & Tester
System configuration & integration
module
Queryset 1
Queryset 3
Queryset M
Analyzer
RDF stores Graph
stores
Relational
DBs
Wide Column
stores
Profiler
Queryset 2
Key value
stores
Queryconversion
module
Query Facet (F2)
Data Facet (F1)
System Facet (F3)
User Interface
(F4)
User
The LITMUS architecture
Thakkar, Harsh. "Towards an Open Extensible
Framework for Empirical Benchmarking of Data
Management Solutions: LITMUS." ESWC, 2017.
10Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Challenges
● Core challenges in developing
such an open, extensible, FAIR
framework?
○ C1 - Data Conversion
○ C2 - Query Translation
○ C3 - Key Performance Indicators
(KPIs)
http://media.thinkadvisor.com/lifehealthpro/article/2015/02/24/challenge.jp
g
11Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
C1 - Data Conversion
● Different data models
○ RDF Graph
○ Property Graph
● To conduct a fair benchmark
conversion is needed
● DMS’s native supported data model
is the best
RDF graph
Property graph
Lots of Data
Real
Synthetic
RQ1 - What are the methods to convert RDF into
Property Graph data model?
12Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
RDF Data Model
● RDF is a triple based graph model, where :
○ Subject: URI, Blank node
○ Predicate: URIs -> property
○ Object: URI, Literal, Blank node
“2017”
ex:Eventex:Person
ex:AMS
“Semantics”
ex:year
ex:name
ex:place
ex:speaker
URI = Universal Resource identifier, analogous
to ISBN for books
Literals = data values
Blank nodes = Desc. of entities that don’t need
to be named.
IRIs*
ex:stim
e
“30”
@prefix ex: <http://example.org>
ex:Person ex:speaker ex:Event
ex:Person ex:name “Harsh”
ex:Person ex:place ex:Bonn
ex:Person ex:age “27”
ex:Event ex:name “Graph Day”
ex:Event ex:Year “2017”
interpretation
representation
“Harsh” ex:name
ex:place
ex:Bonn
“27”
ex:age
13Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
RDF Graphs (RDFGs)
● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes, Literals)
● Going from information to Knowledge using OWL (DLs) and Ontologies
(RDFS, RDFa, etc)
● Bulky
○ Everything is a node-edge-node (edges dont have properties)
○ More relationships per node → More total number of triples!
14Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Property Graph Data Model
● Edge-labelled, directed, attributed, multi-graph
● Vertices and edges both have properties
● Main components:
○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels (strings)
● Super neat (compact), super cute
● Easier to add weighted, reified edges
● Query Languages - CYPHER, Gremlin, PGQL, etc
Name: Semantics
Year: 2017
Place: AMS
Name: Harsh
Age: 27
Place: Bonn
Role: speaker
Time: 30
Person Event
15Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Mapping RDF → PG
● Initial Results:
○ Intra-conversion of graph data models (mapping problem)
○ PoC implementation ready (see GitHub)
● Work in progress:
○ Conversion of properties, blank nodes, etc.
○ Using e.g. Reification, Singleton Property, Hypergraphs, etc.
○ Use case: DBpedia 2016-10 (mapping from .owl & data)
16Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
C2 - Query Translation
● Yes we are linguistically
diverse and so are DMSs!
● That too with different
dialects:
○ SPARQL, CYPHER,
Gremlin, etc
● RDF - SPARQL (W3C ‘08)
● Graph - ??
http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg
17Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Gremlin Traversal Language
http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png
http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png
Gremlin’s Multi-Graph Query Language (GQL) support
18Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Contd…
Multi-DMS & platform support
https://tinkerpop.apache.org/images/oltp-and-olap.png
RQ2 - What are the semantics preserving methods/approaches for translating SPARQL
queries to a graph query language such as Gremlin?
19Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
https://opinionessoftheworld.files.wordpress.com/2013/04/game-of-thrones-daenerys-dragon.j
pg
Gremlinator
Me
20Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
SPARQL → Gremlin
● C2: Gremlinator - the SPARQL-Gremlin translator
○ Formalizing Gremlin traversals in Graph algebra [DEXA ‘17]
○ A novel translation mechanism that maps SPARQL queries to Gremlin
pattern matching traversals [Planned submission - EDBT’18]
○ Nested queries still a challenge (i.e. UNION)
Addressing
RQ2
Talk@Graph Day 2017
21Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
C3 - Metrics/KPIs
RQ3 - What are the strengths and the
limitations of the existing KPIs, and to what
extent do they reflect the performance of a
DMS?
RDF graph
Property graph
Type of Data
Real
Synthetic
[11] https://www.tutorialspoint.com/computer_fundamentals/images/primary_memory.jpg
[12] http://s.hswstatic.com/gif/microprocessor-250x150.jpg
11
Query response time
Precision, Recall
DMS Index size
DMS configuration
Linear
Star
shaped
Snowflake
Type of Query
22Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Selection of KPIs
● CPU and Memory specific metrics:
○ Perf-tool - LITMUS v0.1 (supported)
■ TLB, LLC, instructions, L1 cache, page faults, etc (18 supported
currently)
● Dataset specific metrics:
○ |V|, |E|, Eccentricity, Clustering coefficient, Centrality, etc (in progress)
● Query specific metrics:
○ Type, Length, Response time, Precision, Recall, F1, etc (planned)
● DMS specific:
○ Load time, index time, index size (supported)
23Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Dataset 1 Dataset 2 Dataset 3 Dataset N
Data integration module
Benchmarking Core
Controller & Tester
System configuration & integration
module
Queryset 1
Queryset 3
Queryset M
Analyzer
RDF stores Graph
stores
Relational
DBs
Wide Column
stores
Profiler
Queryset 2
Key value
stores
Queryconversion
module
Query Facet (F2)
Data Facet (F1)
System Facet (F3)
User Interface
(F4)
User
Back in the bigger picture
C1
C2
C3
24Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
The LITMUS Test*
PLOTS
FILES
*Please visit our Poster & Demo for Hands on experience & more details in the paper!
25Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Evaluation
● RQs: Publications
● Framework: Continuous integration (v0.1 released, v0.2
planned Dec ‘17)
○ Reproducing third-party benchmarks
○ Gathering users and experts feedback
○ Going live @Industry:
■ Gremlinator - Apache Tinkerpop
■ Further collaboration… Adoption by other projects - LDBC,
HOBBIT! :-)
26Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Next Steps
● Framework - LITMUS v0.2 launch (Dec ‘17 - planned)
● DMS module - Adding two more DMSs each
● Dataset module - RDF → PG (Dec ‘17)
● Query module - Integrating Gremlinator
● GUI: Aesthetic GUI (may be?)
27Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Acknowledgements
Funding: Supervisors & Mentors:
Prof. Dr.
Soeren Auer
TiB, DE
Prof. Dr. Jens
Lehmann
UBO, DE
Prof. Dr.
Maria-Esther Vidal
TiB, DE
H2020 WDAqua ITN (GA: 642795)
Dr. Marko Rodriguez
DataStax & Apache,
USA
28Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Resources
http://wdaqua.eu/
https://github.com/LITMUS-Benchmark-Suite/sparql-to-gremlin
Code : https://github.com/LITMUS-Benchmark-Suite/
Web : https://litmus-benchmark-suite.github.io
Docker : https://hub.docker.com/r/litmusbenchmarksuite/litmus/
LITMUS Benchmark Suite
THANK YOU !
Harsh Thakkar
University of Bonn
Twitter: @harsh9t
LinkedIn: thakkarharsh
E-mail: harsh9t@gmail.com
Questions? Comments?
Insults? Injuries?
30Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
EXTRA STUFF
31Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Experiments*
Northwind dataset
● PG - Vertices: 3209, Edges: 6177
● RDF - Triples: 33033
BSBM 1M dataset
● PG - Vertices: 92737, Edges: 238309
● RDF - Triples: 1000313
CPU: Intel® Xeon® CPU E5-2660 v3 (20 cores @2.60GHz),
RAM: 128 GB DDR3, HDD: 512 GB SSD, OS: Linux 4.2-generic (x86_64)
Openlink Virtuoso v7.2.4, Apache TinkerGraph-Gremlin v3.2.3

More Related Content

What's hot

ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...Matthäus Zloch
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsArangoDB Database
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph SchemaJoshua Shinavier
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked DataMark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Datambruemmer
 
Big (Language) Data – From research strategies to proof-of-concept and implem...
Big (Language) Data – From research strategies to proof-of-concept and implem...Big (Language) Data – From research strategies to proof-of-concept and implem...
Big (Language) Data – From research strategies to proof-of-concept and implem...LEARN Project
 
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...Dimitris Kontokostas
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...Till Blume
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015Felix Sasaki
 
Automatic creation of mappings between classification systems
Automatic creation of mappings between classification systemsAutomatic creation of mappings between classification systems
Automatic creation of mappings between classification systemsMagnus Pfeffer
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jConnected Data World
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
 
Automatic creation of mappings between classification systems for bibliograph...
Automatic creation of mappings between classification systems for bibliograph...Automatic creation of mappings between classification systems for bibliograph...
Automatic creation of mappings between classification systems for bibliograph...Magnus Pfeffer
 
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...Joshua Shinavier
 

What's hot (20)

TinkerPop 2020
TinkerPop 2020TinkerPop 2020
TinkerPop 2020
 
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
ESWC 2019 - A Software Framework and Datasets for the Analysis of Graphs Meas...
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
 
LD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and toolsLD4KD 2015 - Demos and tools
LD4KD 2015 - Demos and tools
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph Schema
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked DataMark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
 
Big (Language) Data – From research strategies to proof-of-concept and implem...
Big (Language) Data – From research strategies to proof-of-concept and implem...Big (Language) Data – From research strategies to proof-of-concept and implem...
Big (Language) Data – From research strategies to proof-of-concept and implem...
 
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...PhD thesis defense:  Large-scale multilingual knowledge extraction, publishin...
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
 
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...Towards Flexible Indices for  Distributed Graph Data: The Formal Schema-level...
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level...
 
Big Data Profiling
Big Data Profiling Big Data Profiling
Big Data Profiling
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
DBPedia-past-present-future
DBPedia-past-present-futureDBPedia-past-present-future
DBPedia-past-present-future
 
Essentials of R
Essentials of REssentials of R
Essentials of R
 
Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015
 
Automatic creation of mappings between classification systems
Automatic creation of mappings between classification systemsAutomatic creation of mappings between classification systems
Automatic creation of mappings between classification systems
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
Automatic creation of mappings between classification systems for bibliograph...
Automatic creation of mappings between classification systems for bibliograph...Automatic creation of mappings between classification systems for bibliograph...
Automatic creation of mappings between classification systems for bibliograph...
 
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
An Algebraic Data Model for Graphs and Hypergraphs (Category Theory meetup, N...
 

Similar to Semantics 2017 - Trying Not to Die Benchmarking using LITMUS

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and OntarioBigData_Europe
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkGezim Sejdiu
 
Hadoop and Beyond
Hadoop and BeyondHadoop and Beyond
Hadoop and BeyondPaco Nathan
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...Thomas Rodenhausen
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFDimitris Kontokostas
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Bradley Allen
 
HPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarHPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarMartin Hamilton
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasWes McKinney
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Ken Mwai
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...Jens Mittelbach
 
Web Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesWeb Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesHelge Holzmann
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph GeneratorLDBC council
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientistsinside-BigData.com
 
(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen XuYueshen Xu
 

Similar to Semantics 2017 - Trying Not to Die Benchmarking using LITMUS (20)

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Release webinar: Sansa and Ontario
Release webinar: Sansa and OntarioRelease webinar: Sansa and Ontario
Release webinar: Sansa and Ontario
 
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talkDistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
 
Hadoop and Beyond
Hadoop and BeyondHadoop and Beyond
Hadoop and Beyond
 
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
Ranking Resources in Folksonomies by Exploiting Semantic and Context-specific...
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
Graph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDFGraph databases & data integration - the case of RDF
Graph databases & data integration - the case of RDF
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
 
HPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC SeminarHPC in the cloud comes of age - Red Oak HPC Seminar
HPC in the cloud comes of age - Red Oak HPC Seminar
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01Slides 111017220255-phpapp01
Slides 111017220255-phpapp01
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
Web Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesWeb Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web Archives
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
HPC I/O for Computational Scientists
HPC I/O for Computational ScientistsHPC I/O for Computational Scientists
HPC I/O for Computational Scientists
 
(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Semantics 2017 - Trying Not to Die Benchmarking using LITMUS

  • 1. Trying Not to Die Benchmarking using LITMUS Harsh Thakkar1 , Yashwant Keswani2 , Mohnish Dubey1 , Jens Lehmann1,3 , Sören Auer4 1 University of Bonn, Bonn, Germany 2 DA-IICT, Gandhinagar, India 3 Fraunhofer IAIS, St. Augustin, Germany 4 TIB, Hannover, Germany - Amsterdam - Nederland - September 13
  • 2. 2Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Outline ● Motivation ● Problem Statement ● State of the Art ● Approach - LITMUS Benchmark Suite ● Challenges ● Evaluation Plan ● Next Steps
  • 3. 3Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn RDF-3X Ocean of Data Sea of Tools+ K-V stores Graph stores Doc-oriented stores RDF stores Wide column stores Real Synthetic http://lod-cloud.net/versions/2017-02-20/lod.pn g LOD Cloud 2017 Motivation 2
  • 4. 4Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn • Domain specific applications: i.e. perspectives • Choice Overload! • Vendors • Researchers • Users https://steemit.com/philosophy/@l0k1/subjectivity-and-truth-how-blockchains-model-consensus-building Motivation
  • 5. 5Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Benchmarking ● Tedious! ● Needs domain-specific expertise ● Lack of standardization (single focus) ○ Open software, System configuration settings, etc. ● Near-zero Reusability ● Guaranteeing a fair benchmark is difficult! ● Choosing the right performance metrics is cumbersome and subjective ● Visualising benchmark results [6] http://2.bp.blogspot.com/-TkUb0TPN7IA/VewUHm_jVaI/AAAAAAAABgM/vZILnZNJv5A/s1600/2012-10-16-subjective-objective.jpg
  • 6. 6Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Problem Statement “How can diverse cross-domain DMSs be benchmarked in an automated established * standard # environment?”
  • 7. 7Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn State of the Art Benchmark Effort Relational DMSs RDF DMSs Graph DMSs TPC [H,C,E,DS] [13] XGDBench [6] HPC [7] Graph 500 [12] DBPSB [11] LUBM [9] IGUANA [19] WatDiv [1] SP2Bench [20] BSBM [4] Pandora* Graphium [8] LDBC [2] HOBBIT** *http://pandora.ldc.usb.ve/ Single domain Benchmarks Cross domain Benchmarks **https://project-hobbit.eu/
  • 8. 8Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn LITMUS Benchmark Suite
  • 9. 9Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Dataset 1 Dataset 2 Dataset 3 Dataset N Data integration module Benchmarking Core Controller & Tester System configuration & integration module Queryset 1 Queryset 3 Queryset M Analyzer RDF stores Graph stores Relational DBs Wide Column stores Profiler Queryset 2 Key value stores Queryconversion module Query Facet (F2) Data Facet (F1) System Facet (F3) User Interface (F4) User The LITMUS architecture Thakkar, Harsh. "Towards an Open Extensible Framework for Empirical Benchmarking of Data Management Solutions: LITMUS." ESWC, 2017.
  • 10. 10Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Challenges ● Core challenges in developing such an open, extensible, FAIR framework? ○ C1 - Data Conversion ○ C2 - Query Translation ○ C3 - Key Performance Indicators (KPIs) http://media.thinkadvisor.com/lifehealthpro/article/2015/02/24/challenge.jp g
  • 11. 11Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn C1 - Data Conversion ● Different data models ○ RDF Graph ○ Property Graph ● To conduct a fair benchmark conversion is needed ● DMS’s native supported data model is the best RDF graph Property graph Lots of Data Real Synthetic RQ1 - What are the methods to convert RDF into Property Graph data model?
  • 12. 12Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn RDF Data Model ● RDF is a triple based graph model, where : ○ Subject: URI, Blank node ○ Predicate: URIs -> property ○ Object: URI, Literal, Blank node “2017” ex:Eventex:Person ex:AMS “Semantics” ex:year ex:name ex:place ex:speaker URI = Universal Resource identifier, analogous to ISBN for books Literals = data values Blank nodes = Desc. of entities that don’t need to be named. IRIs* ex:stim e “30” @prefix ex: <http://example.org> ex:Person ex:speaker ex:Event ex:Person ex:name “Harsh” ex:Person ex:place ex:Bonn ex:Person ex:age “27” ex:Event ex:name “Graph Day” ex:Event ex:Year “2017” interpretation representation “Harsh” ex:name ex:place ex:Bonn “27” ex:age
  • 13. 13Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn RDF Graphs (RDFGs) ● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes, Literals) ● Going from information to Knowledge using OWL (DLs) and Ontologies (RDFS, RDFa, etc) ● Bulky ○ Everything is a node-edge-node (edges dont have properties) ○ More relationships per node → More total number of triples!
  • 14. 14Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Property Graph Data Model ● Edge-labelled, directed, attributed, multi-graph ● Vertices and edges both have properties ● Main components: ○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels (strings) ● Super neat (compact), super cute ● Easier to add weighted, reified edges ● Query Languages - CYPHER, Gremlin, PGQL, etc Name: Semantics Year: 2017 Place: AMS Name: Harsh Age: 27 Place: Bonn Role: speaker Time: 30 Person Event
  • 15. 15Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Mapping RDF → PG ● Initial Results: ○ Intra-conversion of graph data models (mapping problem) ○ PoC implementation ready (see GitHub) ● Work in progress: ○ Conversion of properties, blank nodes, etc. ○ Using e.g. Reification, Singleton Property, Hypergraphs, etc. ○ Use case: DBpedia 2016-10 (mapping from .owl & data)
  • 16. 16Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn C2 - Query Translation ● Yes we are linguistically diverse and so are DMSs! ● That too with different dialects: ○ SPARQL, CYPHER, Gremlin, etc ● RDF - SPARQL (W3C ‘08) ● Graph - ?? http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg
  • 17. 17Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Gremlin Traversal Language http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png Gremlin’s Multi-Graph Query Language (GQL) support
  • 18. 18Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Contd… Multi-DMS & platform support https://tinkerpop.apache.org/images/oltp-and-olap.png RQ2 - What are the semantics preserving methods/approaches for translating SPARQL queries to a graph query language such as Gremlin?
  • 19. 19Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn https://opinionessoftheworld.files.wordpress.com/2013/04/game-of-thrones-daenerys-dragon.j pg Gremlinator Me
  • 20. 20Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn SPARQL → Gremlin ● C2: Gremlinator - the SPARQL-Gremlin translator ○ Formalizing Gremlin traversals in Graph algebra [DEXA ‘17] ○ A novel translation mechanism that maps SPARQL queries to Gremlin pattern matching traversals [Planned submission - EDBT’18] ○ Nested queries still a challenge (i.e. UNION) Addressing RQ2 Talk@Graph Day 2017
  • 21. 21Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn C3 - Metrics/KPIs RQ3 - What are the strengths and the limitations of the existing KPIs, and to what extent do they reflect the performance of a DMS? RDF graph Property graph Type of Data Real Synthetic [11] https://www.tutorialspoint.com/computer_fundamentals/images/primary_memory.jpg [12] http://s.hswstatic.com/gif/microprocessor-250x150.jpg 11 Query response time Precision, Recall DMS Index size DMS configuration Linear Star shaped Snowflake Type of Query
  • 22. 22Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Selection of KPIs ● CPU and Memory specific metrics: ○ Perf-tool - LITMUS v0.1 (supported) ■ TLB, LLC, instructions, L1 cache, page faults, etc (18 supported currently) ● Dataset specific metrics: ○ |V|, |E|, Eccentricity, Clustering coefficient, Centrality, etc (in progress) ● Query specific metrics: ○ Type, Length, Response time, Precision, Recall, F1, etc (planned) ● DMS specific: ○ Load time, index time, index size (supported)
  • 23. 23Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Dataset 1 Dataset 2 Dataset 3 Dataset N Data integration module Benchmarking Core Controller & Tester System configuration & integration module Queryset 1 Queryset 3 Queryset M Analyzer RDF stores Graph stores Relational DBs Wide Column stores Profiler Queryset 2 Key value stores Queryconversion module Query Facet (F2) Data Facet (F1) System Facet (F3) User Interface (F4) User Back in the bigger picture C1 C2 C3
  • 24. 24Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn The LITMUS Test* PLOTS FILES *Please visit our Poster & Demo for Hands on experience & more details in the paper!
  • 25. 25Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Evaluation ● RQs: Publications ● Framework: Continuous integration (v0.1 released, v0.2 planned Dec ‘17) ○ Reproducing third-party benchmarks ○ Gathering users and experts feedback ○ Going live @Industry: ■ Gremlinator - Apache Tinkerpop ■ Further collaboration… Adoption by other projects - LDBC, HOBBIT! :-)
  • 26. 26Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Next Steps ● Framework - LITMUS v0.2 launch (Dec ‘17 - planned) ● DMS module - Adding two more DMSs each ● Dataset module - RDF → PG (Dec ‘17) ● Query module - Integrating Gremlinator ● GUI: Aesthetic GUI (may be?)
  • 27. 27Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Acknowledgements Funding: Supervisors & Mentors: Prof. Dr. Soeren Auer TiB, DE Prof. Dr. Jens Lehmann UBO, DE Prof. Dr. Maria-Esther Vidal TiB, DE H2020 WDAqua ITN (GA: 642795) Dr. Marko Rodriguez DataStax & Apache, USA
  • 28. 28Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Resources http://wdaqua.eu/ https://github.com/LITMUS-Benchmark-Suite/sparql-to-gremlin Code : https://github.com/LITMUS-Benchmark-Suite/ Web : https://litmus-benchmark-suite.github.io Docker : https://hub.docker.com/r/litmusbenchmarksuite/litmus/ LITMUS Benchmark Suite
  • 29. THANK YOU ! Harsh Thakkar University of Bonn Twitter: @harsh9t LinkedIn: thakkarharsh E-mail: harsh9t@gmail.com Questions? Comments? Insults? Injuries?
  • 30. 30Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn EXTRA STUFF
  • 31. 31Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn Experiments* Northwind dataset ● PG - Vertices: 3209, Edges: 6177 ● RDF - Triples: 33033 BSBM 1M dataset ● PG - Vertices: 92737, Edges: 238309 ● RDF - Triples: 1000313 CPU: Intel® Xeon® CPU E5-2660 v3 (20 cores @2.60GHz), RAM: 128 GB DDR3, HDD: 512 GB SSD, OS: Linux 4.2-generic (x86_64) Openlink Virtuoso v7.2.4, Apache TinkerGraph-Gremlin v3.2.3