SlideShare a Scribd company logo
1 of 38
O N D E X & G R A P H D B S
M A R C O B R A N D I Z I , 1 6 / 1 0 / 2 0 1 7
G O A L S
• Evaluate graph databases (GDBs)/frameworkd/etc in relation to ONDEX needs
• Assess GDBs as kNetMiner/ONDEX backends
• Evaluate a new architecture where raw data access is entirely based on a GDB
• Evaluate a new data exchange format, possibly integrated with one GDBs
• and hence, evaluate the data models too
• Assess data query/manipulation languages (expressivity, ease of use, speed)
• Assess that performance fits to ONDEX needs
T E S T D A T A
Trait Ontology (TO) 1500 nodes, is-a and part-of relations (i.e., mostly tree)
Gene Ontology (GO) Tree with 46k nodes
AraCyc/BioPAX Heterogeneous net, 23k nodes, 40k relations
Ara-kNet Heterogeneous net, 350k nodes 1.150M relations
T E S T S E T T I N G S ( R D F )
T E S T S E T T I N G S ( N E O 4 J )
R D F
R D F / L I N K E D D A T A
E S S E N T I A L S
• Simple, Fine-Grained Data
Model: Property/Value Pairs &
Typed Links
• Designed for Data Integration:
• Universal Identifiers, W3C
Standards
• Strong (even too much)
emphasis on knowledge
modelling via
schemas/ontologies
• Designed for the Web:
Resolvable URIs, Web APIs
R D F / L I N K E D D A T A E S S E N T I A L S
Integration as native citizen, strong emphasis on knowledge modelling, schemas, ontologies
D A T A M O D E L : O N D E X I N R D F
E X A M P L E Q U E R I E S
Count concepts (classes) in Trait Ontology:
select count (distinct ?c) WHERE {
?c a odxcc:TO_TERM.
}
Parts of membrane (transitively):
select distinct ?csup ?supName ?c ?name
WHERE {
?csup odx:conceptName ?supName.
FILTER ( ?supName = "cellular membrane" )
?c odxrt:part_of* ?csup.
?c odx:conceptName ?name.
}
LIMIT 1000
Proteins related to pathways:
select distinct ?prot ?pway {
?prot odxrt:pd_by|odxrt:cs_by ?react;
a odxcc:Protein.
?react a odxcc:Reaction.
?react odxrt:part_of ?pway.
?pway a odxcc:Path.
}
LIMIT 1000
optimised order
‘|’ for property paths
E X A M P L E Q U E R I E S
# part 2
union {
# Branch 2
?prot ^odxrt:ac_by|odxrt:is_a ?enz.
?prot a odxcc:Protein.
?enz a odxcc:Enzyme.
{
# Branch 2.1
?enz odxrt:ac_by|odxrt:in_by ?comp.
?comp a odxcc:Compound.
?comp odxrt:cs_by|odxrt:pd_by ?trns
?trns a odxcc:Transport
}
union {
# Branch 2.2
?enz ^odxrt:ca_by ?trns.
?trns a odxcc:Transport
}
?trns odxrt:part_of ?pway.
?pway a odxcc:Path.
}
} LIMIT 1000
prefix odx: <http://ondex.sourceforge.net/ondex-core#>
prefix odxcc: <http://www.ondex.org/ex/conceptClass/>
prefix odxc: <http://www.ondex.org/ex/concept/>
prefix odxrt: <http://www.ondex.org/ex/relationType/>
prefix odxr: <http://www.ondex.org/ex/relation/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?prot ?pway {
where {
# Branch 1
?prot odxrt:pd_by|odxrt:cs_by ?react.
?prot a odxcc:Protein.
?react a odxcc:Reaction.
?react odxrt:part_of ?pway.
?pway a odxcc:Path.
}
# to be continued…
Proteins related to pathways:
R D F P E R F O R M A N C E
Simple, common queries (Fuseki)
R D F P E R F O R M A N C E
Queries over ONDEX paths (Fuseki)
R D F P E R F O R M A N C E
Queries over ONDEX paths, Virtuoso
N E O 4 J
N E O 4 J E S S E N T I A L S
• Designed to backup applications
• much less about standards or Web-based sharing
• Very little to manage schemas (more later)
• No native data format (except Cypher, support for
GraphML, RDF)
• Initially based on API only, now Cypher available
• Compact, easy, no URIs (can be used as strings)
• Very performant
• Hasn’t much for clustering/federation, but Cypher can be
used in TinkerPop
• More commercial (not necessarily good)
• Cool management interface
• Probably easier to use for the average Java developer
Image credits: https://goo.gl/YLhCXG
N E O 4 J D A T A M O D E L
Both nodes and relations can have attributes
Nodes & relations have labels
(i.e., string-based types)
Cool management interface
(SPARQL version might be a student project)
C Y P H E R Q U E R Y / D M
L A N G U A G E
Proteins->Reactions->Pathways:
// chain of paths, node selection via property (exploits indices)
MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ })
// further conditions, but often not performant
WHERE prot.name =~ ‘(?i)^DNA.+’
// Usual projection and post-selection operators
RETURN prot.name, pway
// Relations can have properties
ORDER BY csby.pvalue
LIMIT 1000
Single-path (or same-direction branching) easy to write:
MATCH (prot:Protein) - [:pd_by|cs_by] -> (:Reaction) - [:part_of*1..3] ->
(pway:Path)
RETURN ID(prot), ID(pway) LIMIT 1000
// Very compact forms available, depending on the data
MATCH (prot:Protein) - (pway:Path) RETURN pway
C Y P H E R Q U E R Y / D M
L A N G U A G E
DML features:
MATCH (prot:Protein{ name:’P53’ }), (pway:Path{ title:’apoptosis’})
CREATE (prot) - [:participates_in] -> (pway)
DML features, embeddable in Java/Python/etc:
UNWIND $rows AS row // $rows set by the invoker, programmatically
MATCH (prot:Protein{ id: row.protId }), (pway:Path{ id:row.pathId })
CREATE (prot) - [relation:participates_in] -> (pway)
SET relation = row.relationAttributes
C Y P H E R / N E O 4 J P E R F O R M A N C E
Simple, common queries
C Y P H E R / N E O 4 J P E R F O R M A N C E
Path Queries
S O U N D S G O O D , B U T …
select distinct ?prot ?pway {
where {
# Branch 1
…
}
union {
# Branch 2
…
{
# Branch 2.1
}
union {
# Branch 2.2
}
…
}
}
• In Cypher?!
• I couldn’t find a decent way, although it might be possible (https://goo.gl/Rpa9SM)
• Partially possible in straightforward way, but redundantly, e.g., Branch 2:
MATCH (prot:Protein) <- [:ac_by] - (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] -
(pway:Path)
RETURN prot, pway LIMIT 100
UNION
MATCH (prot:Protein) - [:is_a] -> (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] -
(pway:Path)
RETURN prot, pway LIMIT 100
A D D E N D U M
select distinct ?prot ?pway {
where {
# Branch 1
…
}
union {
# Branch 2
…
{
# Branch 2.1
}
union {
# Branch 2.2
}
…
}
}
• In Cypher?!
Unions+branches partially possible by means of paths in WHERE:
// Branch 2
MATCH (prot:Protein), (enz:Enzyme), (tns:Transport) - [:part_of] -> (path:Path)
WHERE ( (enz) - [:ac_by|:in_by] -> (:Comp) - [:pd_by|:cs_by] -> (tns) // Branch 2.1
OR (tns) - [:ca_by] -> (enz) ) //Branch 2.2 (pt1)
AND ( (prot) - [:is_a] -> (enz) OR (prot) <- [:ac_by] - (enz) ) // Branch 2.2 (pt2)
RETURN prot, path LIMIT 30
UNION
// Branch1
MATCH (prot:Protein) - [:pd_by|:cs_by] -> (:Reaction) - [:part_of] -> (path:Path)
RETURN prot, path LIMIT 30
• However,
• 41249ms to execute against wheat net.
• it generates cartesian products and can
easily explode
S O U N D S G O O D , B U T …
• What about schemas/metadata/ontologies?
• Node and relations can only have multiple labels attached, which are just
strings. Rich schema-operations not so easy:
• Select any kind of protein, including enzymes, cytokines
• Select any type of ‘interacts with’, including ‘catalysed by’, ‘consumed by’,
‘produced by’ (might require ‘inverse of’)
• Basically, has a relational-oriented view about the schemas
S O U N D S G O O D , B U T …
• Basically, it’s relational-oriented about schemas
• we might still be OK with metadata modelled as graphs, however:
• MATCH (molecule:Molecule),
(molType:Class)-[:is_a*]->(:Class{ name:’Protein’ })
WHERE LABELS molType IN LABELS (molecule)
• It’s expensive to compute (doesn’t exploit indexes)
• MATCH (molecule:Molecule:$additionalLabel) CREATE …
• Parameterising on labels not possible
• Requires non parametric Cypher string => UNWIND-based bulk loading impossible
• => bad performance
• Programmatic approach possible, but a lot of problems with things like Lucene version mismatches (one reason
being that ONDEX would require review and proper plug-in architecture)
F L A T , R D F - L I K E M O D E L
Code for both converters:
github:/marco-brandizi/odx_neo4j_converter_test
F L A T M O D E L I M P A C T O N
C Y P H E R
Structured model:
MATCH (prot:Protein{ id: '250169' }) - [:cs_by] -> (react:Reaction) - [:part_of] -> (pway:Path)
RETURN * LIMIT 100
Flat model:
MATCH (prot:Concept {id: '250169', ccName: 'Protein'})
<- [:from] - (csby:Relation {name: 'cs_by' })
- [:to] -> (react:Concept { ccName: 'Reaction'})
<- [:from] - (partof:Relation {name:'part_of'}) - [:to]
-> (pway:Concept {ccName:'Path'})
RETURN * LIMIT 100
Rich schema-based queries
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
F L A T M O D E L P E R F O R M A N C E
Simple, common queries
F L A T M O D E L P E R F O R M A N C E
Typical ONDEX Graph Queries
I M P A C T O N C Y P H E R
Rich schema-based queries
From:
MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ })
WHERE molType.label IN LABELS (molecule)
To:
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
now it’s efficient-enough (especially with length restrictions)
However…
I M P A C T O N C Y P H E R
Rich schema-based queries
MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass),
(cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
now it’s efficient-enough (especially with length restrictions)
However…
from: MATCH (react:Reaction) - [:part_of] -> (pway:Path)
to: MATCH (react:Concept {ccName: ‘Reaction’})
<- [:from] - (partof:Relation {name:'part_of'})
- [:to] -> (pway:Concept {ccName:'Path'})
What if we want variable-length part_of?
Not currently possible in Cypher (nor in SPARQL),
maybe in future (https://github.com/neo4j/neo4j/issues/88)
=> Having both model, redundantly, would probably be worth
=> makes it not so different than RDF
O T H E R I S S U E S
• Data Exchange format?
• None, except Cypher
• DML not so performant
• In particular, no standard data exchange format
• Could be combined with RDF
• Is Neo4j Open Source?
• Produced by a company, only the Community Edition is OSS
• OpenCypher is available
• Cypher backed by Gremlin/TinkerPop
• Apache project, more reliable OSS-wide
• Performance comparable with Neo4j (https://goo.gl/NK1tn2)
• More choice of implementations
• Alternative QL, but more complicated IMHO (Cypher supported)
Image credits: https://goo.gl/ysBFF2
C O N C L U S I O N S
Neo4J/GraphDBs Virtuoso/Triple Stores
Data X format - +
Data model
+ Relations with properties
- Metadata management
- Relations cannot have properties (req. reification)
+ Metadata as first citizen
Performance + - (comparable)
QL
+ Easier (eg, compact, omissions)? - Expressivity
for some patterns (unions, DML)
- Harder? (URIs, namespaces, verbosity) + More
expressive
Standardisation,
openness
- +
Scalability, big data - TinkerPop probably better
LB/Cluster solutions Over TinkerPop (via SAIL
implementation)
C O N C L U S I O N S
C O N C L U S I O N S
C O N C L U S I O N S
W H Y ?
• Graph + APIs
• Clearer architecture, open to more
applications, not only kNetMiner
• QL makes it easier to develop further
components/analyses/applications
• Standard Data model and format
• Don’t reinvent the wheel
• Data sharing
• Data and app integration
C O N C L U S I O N
S

More Related Content

What's hot

Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Jay Coskey
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Simplilearn
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1andreas_schultz
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationPierre de Lacaze
 
Tackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RTackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RLun-Hsien Chang
 
Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014Simon Ritter
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeFederico Tomassetti
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in RLun-Hsien Chang
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select TopicsJay Coskey
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)Om Ganesh
 
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterLambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterJAXLondon2014
 
Lambdas And Streams Hands On Lab
Lambdas And Streams Hands On LabLambdas And Streams Hands On Lab
Lambdas And Streams Hands On LabSimon Ritter
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfInSync2011
 

What's hot (17)

Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13Introduction to Haskell: 2011-04-13
Introduction to Haskell: 2011-04-13
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1
 
Babar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and RepresentationBabar: Knowledge Recognition, Extraction and Representation
Babar: Knowledge Recognition, Extraction and Representation
 
XML and XPath details
XML and XPath detailsXML and XPath details
XML and XPath details
 
Tackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RTackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in R
 
Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014Lambdas And Streams Hands On Lab, JavaOne 2014
Lambdas And Streams Hands On Lab, JavaOne 2014
 
JavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java codeJavaParser - A tool to generate, analyze and refactor Java code
JavaParser - A tool to generate, analyze and refactor Java code
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in R
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
Lz77 by ayush
Lz77 by ayushLz77 by ayush
Lz77 by ayush
 
Java stream
Java streamJava stream
Java stream
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Java Input Output (java.io.*)
Java Input Output (java.io.*)Java Input Output (java.io.*)
Java Input Output (java.io.*)
 
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterLambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
 
Lambdas And Streams Hands On Lab
Lambdas And Streams Hands On LabLambdas And Streams Hands On Lab
Lambdas And Streams Hands On Lab
 
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdfDatabase & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
Database & Technology 1 _ Tom Kyte _ Efficient PL SQL - Why and How to Use.pdf
 

Similar to A Preliminary survey of RDF/Neo4j as backends for KnetMiner

Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitGreg Landrum
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMfnothaft
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdfJaberRad1
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAMfnothaft
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Sheng Wang
 
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Mariano Rodriguez-Muro
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for GraphsJean Ihm
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistryguest5929fa7
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistrybaoilleach
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationINRIA-OAK
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeNational Institute of Informatics
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRathachai Chawuthai
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMfnothaft
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusInfluxData
 

Similar to A Preliminary survey of RDF/Neo4j as backends for KnetMiner (20)

Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Scalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAMScalable up genomic analysis with ADAM
Scalable up genomic analysis with ADAM
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
Neo4j_Cypher.pdf
Neo4j_Cypher.pdfNeo4j_Cypher.pdf
Neo4j_Cypher.pdf
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
 
User biglm
User biglmUser biglm
User biglm
 
Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013Protein threading using context specific alignment potential ismb-2013
Protein threading using context specific alignment potential ismb-2013
 
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
 
Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018Knetminer Backend Training, Nov 2018
Knetminer Backend Training, Nov 2018
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
 
Scaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAMScaling up genomic analysis with ADAM
Scaling up genomic analysis with ADAM
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | PrometheusCreating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
Creating the PromQL Transpiler for Flux by Julius Volz, Co-Founder | Prometheus
 

More from Rothamsted Research, UK

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesRothamsted Research, UK
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasRothamsted Research, UK
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food DomainRothamsted Research, UK
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesRothamsted Research, UK
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Rothamsted Research, UK
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Rothamsted Research, UK
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...Rothamsted Research, UK
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference serviceRothamsted Research, UK
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialRothamsted Research, UK
 

More from Rothamsted Research, UK (20)

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
Interoperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use CasesInteroperable Data for KnetMiner and DFW Use Cases
Interoperable Data for KnetMiner and DFW Use Cases
 
AgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with BioschemasAgriSchemas: Sharing Agrifood data with Bioschemas
AgriSchemas: Sharing Agrifood data with Bioschemas
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Continuos Integration @Knetminer
Continuos Integration @KnetminerContinuos Integration @Knetminer
Continuos Integration @Knetminer
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
AgriSchemas Progress Report
AgriSchemas Progress ReportAgriSchemas Progress Report
AgriSchemas Progress Report
 
AgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use CasesAgriFood Data, Models, Standards, Tools, Use Cases
AgriFood Data, Models, Standards, Tools, Use Cases
 
Notes about SWAT4LS 2018
Notes about SWAT4LS 2018Notes about SWAT4LS 2018
Notes about SWAT4LS 2018
 
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
Towards FAIRer Biological Knowledge Networks 
Using a Hybrid Linked Data 
and...
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
 
Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?Interoperable Open Data: Which Recipes?
Interoperable Open Data: Which Recipes?
 
Linked Data with the EBI RDF Platform
Linked Data with the EBI RDF PlatformLinked Data with the EBI RDF Platform
Linked Data with the EBI RDF Platform
 
BioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons LearnedBioSD Linked Data: Lessons Learned
BioSD Linked Data: Lessons Learned
 
myEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference servicemyEquivalents, aka a new cross-reference service
myEquivalents, aka a new cross-reference service
 
Dev 2014 LOD tutorial
Dev 2014 LOD tutorialDev 2014 LOD tutorial
Dev 2014 LOD tutorial
 
BioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS TutorialBioSamples Database Linked Data, SWAT4LS Tutorial
BioSamples Database Linked Data, SWAT4LS Tutorial
 
Semic 2013
Semic 2013Semic 2013
Semic 2013
 
Uk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandiziUk onto net_2013_notes_brandizi
Uk onto net_2013_notes_brandizi
 

Recently uploaded

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Recently uploaded (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

A Preliminary survey of RDF/Neo4j as backends for KnetMiner

  • 1. O N D E X & G R A P H D B S M A R C O B R A N D I Z I , 1 6 / 1 0 / 2 0 1 7
  • 2. G O A L S • Evaluate graph databases (GDBs)/frameworkd/etc in relation to ONDEX needs • Assess GDBs as kNetMiner/ONDEX backends • Evaluate a new architecture where raw data access is entirely based on a GDB • Evaluate a new data exchange format, possibly integrated with one GDBs • and hence, evaluate the data models too • Assess data query/manipulation languages (expressivity, ease of use, speed) • Assess that performance fits to ONDEX needs
  • 3. T E S T D A T A Trait Ontology (TO) 1500 nodes, is-a and part-of relations (i.e., mostly tree) Gene Ontology (GO) Tree with 46k nodes AraCyc/BioPAX Heterogeneous net, 23k nodes, 40k relations Ara-kNet Heterogeneous net, 350k nodes 1.150M relations
  • 4. T E S T S E T T I N G S ( R D F )
  • 5. T E S T S E T T I N G S ( N E O 4 J )
  • 7. R D F / L I N K E D D A T A E S S E N T I A L S • Simple, Fine-Grained Data Model: Property/Value Pairs & Typed Links • Designed for Data Integration: • Universal Identifiers, W3C Standards • Strong (even too much) emphasis on knowledge modelling via schemas/ontologies • Designed for the Web: Resolvable URIs, Web APIs
  • 8. R D F / L I N K E D D A T A E S S E N T I A L S Integration as native citizen, strong emphasis on knowledge modelling, schemas, ontologies
  • 9. D A T A M O D E L : O N D E X I N R D F
  • 10. E X A M P L E Q U E R I E S Count concepts (classes) in Trait Ontology: select count (distinct ?c) WHERE { ?c a odxcc:TO_TERM. } Parts of membrane (transitively): select distinct ?csup ?supName ?c ?name WHERE { ?csup odx:conceptName ?supName. FILTER ( ?supName = "cellular membrane" ) ?c odxrt:part_of* ?csup. ?c odx:conceptName ?name. } LIMIT 1000 Proteins related to pathways: select distinct ?prot ?pway { ?prot odxrt:pd_by|odxrt:cs_by ?react; a odxcc:Protein. ?react a odxcc:Reaction. ?react odxrt:part_of ?pway. ?pway a odxcc:Path. } LIMIT 1000 optimised order ‘|’ for property paths
  • 11. E X A M P L E Q U E R I E S # part 2 union { # Branch 2 ?prot ^odxrt:ac_by|odxrt:is_a ?enz. ?prot a odxcc:Protein. ?enz a odxcc:Enzyme. { # Branch 2.1 ?enz odxrt:ac_by|odxrt:in_by ?comp. ?comp a odxcc:Compound. ?comp odxrt:cs_by|odxrt:pd_by ?trns ?trns a odxcc:Transport } union { # Branch 2.2 ?enz ^odxrt:ca_by ?trns. ?trns a odxcc:Transport } ?trns odxrt:part_of ?pway. ?pway a odxcc:Path. } } LIMIT 1000 prefix odx: <http://ondex.sourceforge.net/ondex-core#> prefix odxcc: <http://www.ondex.org/ex/conceptClass/> prefix odxc: <http://www.ondex.org/ex/concept/> prefix odxrt: <http://www.ondex.org/ex/relationType/> prefix odxr: <http://www.ondex.org/ex/relation/> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> select distinct ?prot ?pway { where { # Branch 1 ?prot odxrt:pd_by|odxrt:cs_by ?react. ?prot a odxcc:Protein. ?react a odxcc:Reaction. ?react odxrt:part_of ?pway. ?pway a odxcc:Path. } # to be continued… Proteins related to pathways:
  • 12. R D F P E R F O R M A N C E Simple, common queries (Fuseki)
  • 13. R D F P E R F O R M A N C E Queries over ONDEX paths (Fuseki)
  • 14. R D F P E R F O R M A N C E Queries over ONDEX paths, Virtuoso
  • 15. N E O 4 J
  • 16. N E O 4 J E S S E N T I A L S • Designed to backup applications • much less about standards or Web-based sharing • Very little to manage schemas (more later) • No native data format (except Cypher, support for GraphML, RDF) • Initially based on API only, now Cypher available • Compact, easy, no URIs (can be used as strings) • Very performant • Hasn’t much for clustering/federation, but Cypher can be used in TinkerPop • More commercial (not necessarily good) • Cool management interface • Probably easier to use for the average Java developer Image credits: https://goo.gl/YLhCXG
  • 17. N E O 4 J D A T A M O D E L Both nodes and relations can have attributes Nodes & relations have labels (i.e., string-based types) Cool management interface (SPARQL version might be a student project)
  • 18. C Y P H E R Q U E R Y / D M L A N G U A G E Proteins->Reactions->Pathways: // chain of paths, node selection via property (exploits indices) MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] -> (pway:Path{ title: ‘apoptosis’ }) // further conditions, but often not performant WHERE prot.name =~ ‘(?i)^DNA.+’ // Usual projection and post-selection operators RETURN prot.name, pway // Relations can have properties ORDER BY csby.pvalue LIMIT 1000 Single-path (or same-direction branching) easy to write: MATCH (prot:Protein) - [:pd_by|cs_by] -> (:Reaction) - [:part_of*1..3] -> (pway:Path) RETURN ID(prot), ID(pway) LIMIT 1000 // Very compact forms available, depending on the data MATCH (prot:Protein) - (pway:Path) RETURN pway
  • 19. C Y P H E R Q U E R Y / D M L A N G U A G E DML features: MATCH (prot:Protein{ name:’P53’ }), (pway:Path{ title:’apoptosis’}) CREATE (prot) - [:participates_in] -> (pway) DML features, embeddable in Java/Python/etc: UNWIND $rows AS row // $rows set by the invoker, programmatically MATCH (prot:Protein{ id: row.protId }), (pway:Path{ id:row.pathId }) CREATE (prot) - [relation:participates_in] -> (pway) SET relation = row.relationAttributes
  • 20. C Y P H E R / N E O 4 J P E R F O R M A N C E Simple, common queries
  • 21. C Y P H E R / N E O 4 J P E R F O R M A N C E Path Queries
  • 22. S O U N D S G O O D , B U T … select distinct ?prot ?pway { where { # Branch 1 … } union { # Branch 2 … { # Branch 2.1 } union { # Branch 2.2 } … } } • In Cypher?! • I couldn’t find a decent way, although it might be possible (https://goo.gl/Rpa9SM) • Partially possible in straightforward way, but redundantly, e.g., Branch 2: MATCH (prot:Protein) <- [:ac_by] - (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] - (pway:Path) RETURN prot, pway LIMIT 100 UNION MATCH (prot:Protein) - [:is_a] -> (:Enzyme) <- [:ca_by] - (:Transport) <- [:part_of] - (pway:Path) RETURN prot, pway LIMIT 100
  • 23. A D D E N D U M select distinct ?prot ?pway { where { # Branch 1 … } union { # Branch 2 … { # Branch 2.1 } union { # Branch 2.2 } … } } • In Cypher?! Unions+branches partially possible by means of paths in WHERE: // Branch 2 MATCH (prot:Protein), (enz:Enzyme), (tns:Transport) - [:part_of] -> (path:Path) WHERE ( (enz) - [:ac_by|:in_by] -> (:Comp) - [:pd_by|:cs_by] -> (tns) // Branch 2.1 OR (tns) - [:ca_by] -> (enz) ) //Branch 2.2 (pt1) AND ( (prot) - [:is_a] -> (enz) OR (prot) <- [:ac_by] - (enz) ) // Branch 2.2 (pt2) RETURN prot, path LIMIT 30 UNION // Branch1 MATCH (prot:Protein) - [:pd_by|:cs_by] -> (:Reaction) - [:part_of] -> (path:Path) RETURN prot, path LIMIT 30 • However, • 41249ms to execute against wheat net. • it generates cartesian products and can easily explode
  • 24. S O U N D S G O O D , B U T … • What about schemas/metadata/ontologies? • Node and relations can only have multiple labels attached, which are just strings. Rich schema-operations not so easy: • Select any kind of protein, including enzymes, cytokines • Select any type of ‘interacts with’, including ‘catalysed by’, ‘consumed by’, ‘produced by’ (might require ‘inverse of’) • Basically, has a relational-oriented view about the schemas
  • 25. S O U N D S G O O D , B U T … • Basically, it’s relational-oriented about schemas • we might still be OK with metadata modelled as graphs, however: • MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ }) WHERE LABELS molType IN LABELS (molecule) • It’s expensive to compute (doesn’t exploit indexes) • MATCH (molecule:Molecule:$additionalLabel) CREATE … • Parameterising on labels not possible • Requires non parametric Cypher string => UNWIND-based bulk loading impossible • => bad performance • Programmatic approach possible, but a lot of problems with things like Lucene version mismatches (one reason being that ONDEX would require review and proper plug-in architecture)
  • 26. F L A T , R D F - L I K E M O D E L Code for both converters: github:/marco-brandizi/odx_neo4j_converter_test
  • 27. F L A T M O D E L I M P A C T O N C Y P H E R Structured model: MATCH (prot:Protein{ id: '250169' }) - [:cs_by] -> (react:Reaction) - [:part_of] -> (pway:Path) RETURN * LIMIT 100 Flat model: MATCH (prot:Concept {id: '250169', ccName: 'Protein'}) <- [:from] - (csby:Relation {name: 'cs_by' }) - [:to] -> (react:Concept { ccName: 'Reaction'}) <- [:from] - (partof:Relation {name:'part_of'}) - [:to] -> (pway:Concept {ccName:'Path'}) RETURN * LIMIT 100 Rich schema-based queries MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass), (cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’}
  • 28. F L A T M O D E L P E R F O R M A N C E Simple, common queries
  • 29. F L A T M O D E L P E R F O R M A N C E Typical ONDEX Graph Queries
  • 30. I M P A C T O N C Y P H E R Rich schema-based queries From: MATCH (molecule:Molecule), (molType:Class)-[:is_a*]->(:Class{ name:’Protein’ }) WHERE molType.label IN LABELS (molecule) To: MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass), (cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’} now it’s efficient-enough (especially with length restrictions) However…
  • 31. I M P A C T O N C Y P H E R Rich schema-based queries MATCH (mol:{Concept}) <- [:conceptClass] - (cc:ConceptClass), (cc) <- [:specializationOf*] - (:ConceptClass{name:’Protein’} now it’s efficient-enough (especially with length restrictions) However… from: MATCH (react:Reaction) - [:part_of] -> (pway:Path) to: MATCH (react:Concept {ccName: ‘Reaction’}) <- [:from] - (partof:Relation {name:'part_of'}) - [:to] -> (pway:Concept {ccName:'Path'}) What if we want variable-length part_of? Not currently possible in Cypher (nor in SPARQL), maybe in future (https://github.com/neo4j/neo4j/issues/88) => Having both model, redundantly, would probably be worth => makes it not so different than RDF
  • 32. O T H E R I S S U E S • Data Exchange format? • None, except Cypher • DML not so performant • In particular, no standard data exchange format • Could be combined with RDF • Is Neo4j Open Source? • Produced by a company, only the Community Edition is OSS • OpenCypher is available • Cypher backed by Gremlin/TinkerPop • Apache project, more reliable OSS-wide • Performance comparable with Neo4j (https://goo.gl/NK1tn2) • More choice of implementations • Alternative QL, but more complicated IMHO (Cypher supported) Image credits: https://goo.gl/ysBFF2
  • 33. C O N C L U S I O N S Neo4J/GraphDBs Virtuoso/Triple Stores Data X format - + Data model + Relations with properties - Metadata management - Relations cannot have properties (req. reification) + Metadata as first citizen Performance + - (comparable) QL + Easier (eg, compact, omissions)? - Expressivity for some patterns (unions, DML) - Harder? (URIs, namespaces, verbosity) + More expressive Standardisation, openness - + Scalability, big data - TinkerPop probably better LB/Cluster solutions Over TinkerPop (via SAIL implementation)
  • 34. C O N C L U S I O N S
  • 35. C O N C L U S I O N S
  • 36. C O N C L U S I O N S
  • 37. W H Y ? • Graph + APIs • Clearer architecture, open to more applications, not only kNetMiner • QL makes it easier to develop further components/analyses/applications • Standard Data model and format • Don’t reinvent the wheel • Data sharing • Data and app integration
  • 38. C O N C L U S I O N S