Semantics 2017 - Trying Not to Die Benchmarking using LITMUS

Trying Not to Die Benchmarking
using LITMUS
Harsh Thakkar1
, Yashwant Keswani2
, Mohnish Dubey1
,
Jens Lehmann1,3
, Sören Auer4
1
University of Bonn, Bonn, Germany
2
DA-IICT, Gandhinagar, India
3
Fraunhofer IAIS, St. Augustin, Germany
4
TIB, Hannover, Germany
- Amsterdam - Nederland - September 13

2Semantics 2017 - Amsterdam - Nederland - September 13 Trying Not to Die Benchmarking... - Harsh Thakkar - University of Bonn
Outline
● Motivation
● Problem Statement
● State of the Art
● Approach - LITMUS Benchmark Suite
● Challenges
● Evaluation Plan
● Next Steps

RDF-3X
Ocean of Data
Sea of Tools+
K-V stores
Graph stores
Doc-oriented
stores
RDF stores
Wide column
stores
Real
Synthetic
http://lod-cloud.net/versions/2017-02-20/lod.pn
g
LOD Cloud 2017
Motivation
2

• Domain specific
applications:
i.e. perspectives
• Choice Overload!
• Vendors
• Researchers
• Users
https://steemit.com/philosophy/@l0k1/subjectivity-and-truth-how-blockchains-model-consensus-building
Motivation

Benchmarking
● Tedious!
● Needs domain-specific expertise
● Lack of standardization (single focus)
○ Open software, System configuration
settings, etc.
● Near-zero Reusability
● Guaranteeing a fair benchmark is difficult!
● Choosing the right performance metrics is
cumbersome and subjective
● Visualising benchmark results
[6] http://2.bp.blogspot.com/-TkUb0TPN7IA/VewUHm_jVaI/AAAAAAAABgM/vZILnZNJv5A/s1600/2012-10-16-subjective-objective.jpg

Problem Statement
“How can diverse cross-domain DMSs
be benchmarked in an automated
established *
standard #
environment?”

State of the Art
Benchmark Effort Relational DMSs RDF DMSs Graph DMSs
TPC [H,C,E,DS] [13]
XGDBench [6]
HPC [7]
Graph 500 [12]
DBPSB [11]
LUBM [9]
IGUANA [19]
WatDiv [1]
SP2Bench [20]
BSBM [4]
Pandora*
Graphium [8]
LDBC [2]
HOBBIT**
*http://pandora.ldc.usb.ve/
Single domain
Benchmarks
Cross domain
Benchmarks
**https://project-hobbit.eu/

LITMUS Benchmark Suite

Dataset 1 Dataset 2 Dataset 3 Dataset N
Data integration module
Benchmarking Core
Controller & Tester
System configuration & integration
module
Queryset 1
Queryset 3
Queryset M
Analyzer
RDF stores Graph
stores
Relational
DBs
Wide Column
stores
Profiler
Queryset 2
Key value
stores
Queryconversion
module
Query Facet (F2)
Data Facet (F1)
System Facet (F3)
User Interface
(F4)
User
The LITMUS architecture
Thakkar, Harsh. "Towards an Open Extensible
Framework for Empirical Benchmarking of Data
Management Solutions: LITMUS." ESWC, 2017.

Challenges
● Core challenges in developing
such an open, extensible, FAIR
framework?
○ C1 - Data Conversion
○ C2 - Query Translation
○ C3 - Key Performance Indicators
(KPIs)
http://media.thinkadvisor.com/lifehealthpro/article/2015/02/24/challenge.jp
g

C1 - Data Conversion
● Different data models
○ RDF Graph
○ Property Graph
● To conduct a fair benchmark
conversion is needed
● DMS’s native supported data model
is the best
RDF graph
Property graph
Lots of Data
Real
Synthetic
RQ1 - What are the methods to convert RDF into
Property Graph data model?

RDF Data Model
● RDF is a triple based graph model, where :
○ Subject: URI, Blank node
○ Predicate: URIs -> property
○ Object: URI, Literal, Blank node
“2017”
ex:Eventex:Person
ex:AMS
“Semantics”
ex:year
ex:name
ex:place
ex:speaker
URI = Universal Resource identifier, analogous
to ISBN for books
Literals = data values
Blank nodes = Desc. of entities that don’t need
to be named.
IRIs*
ex:stim
e
“30”
@prefix ex: <http://example.org>
ex:Person ex:speaker ex:Event
ex:Person ex:name “Harsh”
ex:Person ex:place ex:Bonn
ex:Person ex:age “27”
ex:Event ex:name “Graph Day”
ex:Event ex:Year “2017”
interpretation
representation
“Harsh” ex:name
ex:place
ex:Bonn
“27”
ex:age

RDF Graphs (RDFGs)
● Edge-labelled, directed, multi-graphs (w. Ent. URIs, Blank nodes, Literals)
● Going from information to Knowledge using OWL (DLs) and Ontologies
(RDFS, RDFa, etc)
● Bulky
○ Everything is a node-edge-node (edges dont have properties)
○ More relationships per node → More total number of triples!

Property Graph Data Model
● Edge-labelled, directed, attributed, multi-graph
● Vertices and edges both have properties
● Main components:
○ Vertices, edges (Src,Dsc), properties (key-value pairs), labels (strings)
● Super neat (compact), super cute
● Easier to add weighted, reified edges
● Query Languages - CYPHER, Gremlin, PGQL, etc
Name: Semantics
Year: 2017
Place: AMS
Name: Harsh
Age: 27
Place: Bonn
Role: speaker
Time: 30
Person Event

Mapping RDF → PG
● Initial Results:
○ Intra-conversion of graph data models (mapping problem)
○ PoC implementation ready (see GitHub)
● Work in progress:
○ Conversion of properties, blank nodes, etc.
○ Using e.g. Reification, Singleton Property, Hypergraphs, etc.
○ Use case: DBpedia 2016-10 (mapping from .owl & data)

C2 - Query Translation
● Yes we are linguistically
diverse and so are DMSs!
● That too with different
dialects:
○ SPARQL, CYPHER,
Gremlin, etc
● RDF - SPARQL (W3C ‘08)
● Graph - ??
http://cdn2.wpbeginner.com/wp-content/uploads/2015/02/multilingual-wordpress.jpg

Gremlin Traversal Language
http://www.datastax.com/wp-content/uploads/2015/09/many-to-many-mapping.png
http://www.datastax.com/wp-content/uploads/2015/09/gtm-dataflow.png
Gremlin’s Multi-Graph Query Language (GQL) support

Contd…
Multi-DMS & platform support
https://tinkerpop.apache.org/images/oltp-and-olap.png
RQ2 - What are the semantics preserving methods/approaches for translating SPARQL
queries to a graph query language such as Gremlin?

https://opinionessoftheworld.files.wordpress.com/2013/04/game-of-thrones-daenerys-dragon.j
pg
Gremlinator
Me

SPARQL → Gremlin
● C2: Gremlinator - the SPARQL-Gremlin translator
○ Formalizing Gremlin traversals in Graph algebra [DEXA ‘17]
○ A novel translation mechanism that maps SPARQL queries to Gremlin
pattern matching traversals [Planned submission - EDBT’18]
○ Nested queries still a challenge (i.e. UNION)
Addressing
RQ2
Talk@Graph Day 2017

C3 - Metrics/KPIs
RQ3 - What are the strengths and the
limitations of the existing KPIs, and to what
extent do they reflect the performance of a
DMS?
RDF graph
Property graph
Type of Data
Real
Synthetic
[11] https://www.tutorialspoint.com/computer_fundamentals/images/primary_memory.jpg
[12] http://s.hswstatic.com/gif/microprocessor-250x150.jpg
11
Query response time
Precision, Recall
DMS Index size
DMS configuration
Linear
Star
shaped
Snowflake
Type of Query

Selection of KPIs
● CPU and Memory specific metrics:
○ Perf-tool - LITMUS v0.1 (supported)
■ TLB, LLC, instructions, L1 cache, page faults, etc (18 supported
currently)
● Dataset specific metrics:
○ |V|, |E|, Eccentricity, Clustering coefficient, Centrality, etc (in progress)
● Query specific metrics:
○ Type, Length, Response time, Precision, Recall, F1, etc (planned)
● DMS specific:
○ Load time, index time, index size (supported)

Dataset 1 Dataset 2 Dataset 3 Dataset N
Data integration module
Benchmarking Core
Controller & Tester
System configuration & integration
module
Queryset 1
Queryset 3
Queryset M
Analyzer
RDF stores Graph
stores
Relational
DBs
Wide Column
stores
Profiler
Queryset 2
Key value
stores
Queryconversion
module
Query Facet (F2)
Data Facet (F1)
System Facet (F3)
User Interface
(F4)
User
Back in the bigger picture
C1
C2
C3

The LITMUS Test*
PLOTS
FILES
*Please visit our Poster & Demo for Hands on experience & more details in the paper!

Evaluation
● RQs: Publications
● Framework: Continuous integration (v0.1 released, v0.2
planned Dec ‘17)
○ Reproducing third-party benchmarks
○ Gathering users and experts feedback
○ Going live @Industry:
■ Gremlinator - Apache Tinkerpop
■ Further collaboration… Adoption by other projects - LDBC,
HOBBIT! :-)

Next Steps
● Framework - LITMUS v0.2 launch (Dec ‘17 - planned)
● DMS module - Adding two more DMSs each
● Dataset module - RDF → PG (Dec ‘17)
● Query module - Integrating Gremlinator
● GUI: Aesthetic GUI (may be?)

Acknowledgements
Funding: Supervisors & Mentors:
Prof. Dr.
Soeren Auer
TiB, DE
Prof. Dr. Jens
Lehmann
UBO, DE
Prof. Dr.
Maria-Esther Vidal
TiB, DE
H2020 WDAqua ITN (GA: 642795)
Dr. Marko Rodriguez
DataStax & Apache,
USA

Resources
http://wdaqua.eu/
https://github.com/LITMUS-Benchmark-Suite/sparql-to-gremlin
Code : https://github.com/LITMUS-Benchmark-Suite/
Web : https://litmus-benchmark-suite.github.io
Docker : https://hub.docker.com/r/litmusbenchmarksuite/litmus/
LITMUS Benchmark Suite

THANK YOU !
Harsh Thakkar
University of Bonn
Twitter: @harsh9t
LinkedIn: thakkarharsh
E-mail: harsh9t@gmail.com
Questions? Comments?
Insults? Injuries?

EXTRA STUFF

Experiments*
Northwind dataset
● PG - Vertices: 3209, Edges: 6177
● RDF - Triples: 33033
BSBM 1M dataset
● PG - Vertices: 92737, Edges: 238309
● RDF - Triples: 1000313
CPU: Intel® Xeon® CPU E5-2660 v3 (20 cores @2.60GHz),
RAM: 128 GB DDR3, HDD: 512 GB SSD, OS: Linux 4.2-generic (x86_64)
Openlink Virtuoso v7.2.4, Apache TinkerGraph-Gremlin v3.2.3

Semantics 2017 - Trying Not to Die Benchmarking using LITMUS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Semantics 2017 - Trying Not to Die Benchmarking using LITMUS

Similar to Semantics 2017 - Trying Not to Die Benchmarking using LITMUS (20)

Recently uploaded

Recently uploaded (20)

Semantics 2017 - Trying Not to Die Benchmarking using LITMUS