SlideShare a Scribd company logo
1 of 29
Download to read offline
Anything-to-Graph
Knowledge Graph Conference
May 2021
Joshua Shinavier, PhD ( : joshsh)
Data @
Overview
○ Building graphs
○ Models
○ Mappings
○ Use cases
○ TinkerPop 4
Building graphs
There are graphs, and graphs
○ Domain-specific graphs are simplest
○ ETL a few data sources into a graph
○ Mappings can be written and maintained by hand
○ Off-the-shelf tools work well
○ Difficulty increases with
○ Complexity of source schemas
○ Diversity of ownership, quality, and governance in data sources
○ Lack of standardization on languages and vocabulary
○ Some challenges are organizational, others technical
○ Organizational: see Lessons From Reality (KGC 2019)
○ Technical: let’s take a closer look
The challenges of heterogeneity
○ Diverse data sources → need supporting metadata
○ E.g. see Uber’s Databook
○ Diverse data governance → need better data isolation
○ Typically, RDF triple stores do better than PG databases here
○ Diverse domain data models → need standardized schemas
○ E.g. see Uber’s Data Standardization
○ Diverse schema and data languages → need well-defined mappings
○ Enter the Dragon
Prerequisites
○ Data must conform to a schema
○ It is OK to have a mix of schemas, and schema languages
○ Unique identifiers must be clearly distinguished, and typed
○ What is this UUID field? Does it identify a User, a Document, etc.?
○ Some degree of standardization is needed
○ Are identifiers consistent across data sources?
○ Can this timestamp value be compared with that timestamp value? Etc.
○ Need well-defined mappings
○ From each data language into a graph format
○ From each schema language into a graph schema language
○ ...without losing too much information
○ ...and while maintaining consistency between schema and data
Models
Is there a “universal data model” for graphs?
○ Desirable characteristics
○ Centrality: ease of alignment with other graph and non-graph models
○ Flexibility: captures a wide variety of graph structures
○ Formality: serves as a tool for reasoning about data models
○ Practicality: serves as a basis for inference, query optimization, etc.
○ Intuitiveness: a good graph schema captures our mental model
○ Where can the right abstractions be found?
○ Logics? Set theory? Algebras? Category theory?
○ Does the data model already exist?
○ What about RDF-based languages (RDFS, OWL, SHACL, ShEx, etc.)?
○ Is that what GQL is going to be?
Graph features are very diverse
Spreadsheet by Gábor Szárnyas
Complexity is not the answer
○ No one language will ever satisfy all graph use cases
○ Implementing the model must be simple, or developers won’t
○ Who remembers CODASYL? Why aren’t we using it?
○ Go for minimalism + extensibility
○ Dimensions being explored for GQL
○ Mathematical foundations
○ Nominal vs. structural typing
○ Inheritance vs. composition
○ Graph element types vs. property types
○ Descriptive vs. prescriptive schemas
A little algebra goes a long way
○ Algebraic Property Graphs1
○ Pragmatic and opinionated graph data model
○ Schemas are vertex/edge/etc. labels bound to algebraic data types
○ Formally defined using category theory
○ Effectively a subset of SHACL
○ Ideally, GQL will also be a superset
[1] https://arxiv.org/abs/1909.04881
Mappings
Desirable properties
○ Composability
○ If we have f : A → B and g : B → C, we should also have f;g : A → C
○ Bidirectionality
○ If we have f : A → B, we should also have f-1
: B → A, with f;f-1
≅ f-1
;f ≅ id
○ Data : schema consistency
○ Mappings should be defined for data and schemas languages in parallel
○ If data d conforms to schema s, then we need f(d) to conform to f(s)
Topology of mappings
○ Arbitrary / ad-hoc topology
○ Pros: seems easiest; just use the pairwise mappings you have
○ Cons: more indirection, and mappings may not compose well
○ Star topology
○ Pros: mappings compose well, and all paths are short
○ Cons: need to define/identify a central data model (the “dragon”)
Transforming schemas and data
○ Schema-level mappings
○ Transformations on data types
○ Data-level mappings
○ Transformations on instances of data types
○ Schema and data-level mappings must be consistent
○ a :: A ⇒ F(a) :: F(A)
○ Use common intermediate representations
Dragon
○ Framework for data and schema migration
○ Developed for Uber’s Data Standardization
○ Dragon data model
○ Extension of Algebraic Property Graphs
○ Covers the majority of schemas at Uber
○ Sources and targets
○ RPC / interface languages: Protocol Buffers, Apache Thrift, Apache Avro
○ RDF-based languages: SHACL, OWL
○ “Schemaless” formats: YAML, JSON
○ Programming languages: Haskell, Java, Scala
○ Implementations in Haskell and Java
○ Dragon generates over half of its own source code (from YAML)
$ dragon transform -i Protobuf -o SHACL $SCHEMAS -d maps /tmp/shacl
______. ______/ .______
Dragon 0.3.4 ----___ O .. O ___----
---------------------vvv----vvvv----vvv---------------------
Loading configuration from dragon.yaml
Reading from Protobuf starting at maps in base directory /Users/joshsh/projects/uber/example/idl
Found 3 *.proto sources in /Users/joshsh/projects/uber/example/idl/maps
Visiting 3 files (total of 0 so far):
/Users/joshsh/projects/uber/example/idl/maps/coordinates.proto
/Users/joshsh/projects/uber/example/idl/maps/geometry.proto
/Users/joshsh/projects/uber/example/idl/maps/position.proto
Visiting 3 files (total of 3 so far):
/Users/joshsh/projects/uber/example/idl/physics/units.proto
/Users/joshsh/projects/uber/example/idl/time/duration.proto
/Users/joshsh/projects/uber/example/idl/time/epoch.proto
Instantiated a graph of 12 schemas
Note 1 alert:
info: unsigned integers not supported for RDF targets. Using signed integers
Writing 12 SHACL artifacts to /tmp/shacl
Use cases
JSON-to-Graph
○ Graph data formats are used nowhere in the enterprise
○ RDF serialization formats, GraphML, etc. are a hard sell
○ JSON and YAML are used everywhere
○ Give the JSON or YAML a schema, then treat it like a serialized graph
○ E.g. Databook1
snapshots to RDF
[1] https://eng.uber.com/metadata-insights-databook
gRPC-to-Graph
○ Don’t let developers write individual edges / triples to the graph
○ Schema constraints are harder to enforce, errors are harder to understand
○ Transform graph schemas into developer-facing schemas
○ Use familiar schema languages and protocols like Protobuf + gRPC
○ Transform payload messages into graph data
○ Schema and data transformations guarantee constraint satisfaction
Enriching domain schemas
○ Need a little more than “just Protobuf” / “just Thrift” to build a big graph
○ E.g. entity / identifier types need to be widely re-used
○ Start with a graph schema (e.g. in YAML)
○ Propagate logical data types into domain schema languages
○ At Uber: Protobuf, Thrift, Avro
○ Re-use the logical types in many domain schemas
○ Incentivize re-use, and use schema transformations for metrics
○ Now, graph-friendly types are everywhere
Toward TinkerPop 4
○ Alibaba Graph Database
○ Amazon Neptune
○ ArangoDB
○ Bitsy
○ Blazegraph
○ CosmosDB
○ ChronoGraph
○ DSEGraph
○ GRAKN.AI
○ Hadoop (Spark)
○ HGraphDB
○ Huawei Graph Engine Service
○ IBM Graph
○ JanusGraph
○ Neo4j
○ neo4j-gremlin-bolt
○ OrientDB
○ Apache S2Graph
○ Sqlg
○ Stardog
○ TinkerGraph
○ Titan
○ Titan + Tupl
○ Unipop
Dozens of graph systems
○ Clojure: ogre
○ Cypher: cypher-for-gremlin
○ Elixir: gremlex
○ Go: grammes, gremgo
○ Haskell: greskell, gremlin-haskell
○ Java: Ferma, gremlin-objects,
Peapod, spring-data-gremlin,
gremlin-driver
○ JavaScript: gremlin-javascript,
gremlin-orm,
gremlin-template-string
○ Kotlin: kotlin-gremlin-ogm
○ .NET: Gremlin.Net, Gremlinq
○ PHP: gremlin-php
○ Python: Goblin, gremlin-python,
gremlin-py, ipython-gremlin,
gremlinclient, gremlin-python, JUGRI,
gremlinrestclient, python-gremlin-rest
○ Ruby: gremlin_client
○ Rust: gremlin-rs
○ Scala: gremlin-scala,
reactive-gremlin,
scalajs-gremlin-client
○ SPARQL: sparql-gremlin
○ SQL: sql-gremlin
○ Typescript: ts-tinkerpop
Dozens of programming languages
Interoperability via mappings
○ We need parity across languages and systems
○ Make it easier to create new Gremlin language variants in a consistent way
○ “Escape from the JVM”
○ Code generation can help
○ Generate graph APIs consistently into many programming languages
○ Generate grammars for parsing/validation
A unifying schema language
○ A few vendor-specific schema languages exist
○ Weak and disconnected from each other
○ Also note TinkerPop’s Graph.Features
○ Need a real, vendor-neutral schema language
○ Facilitate composability of data and queries
○ Enable inference for type safety and optimizations
○ Propagate logical schemas into multiple graph back-ends
○ Build vendor-agnostic tools for data migration
Serialization formats for graphs
○ Currently serialization options are limited
○ GraphML (XML), GraphSON (JSON), GraphBinary, Gryo (Kryo)
○ Generic JSON and YAML are straightforward
○ What about common RPC formats?
○ Protobuf, Thrift, Avro, etc.
○ What about RDF serialization formats?
○ N-Triples, Turtle, JSON-LD, etc.
○ Others?
○ Parquet? CBOR? FlatBuffers? MessagePack? Etc.
Stay tuned
○ “How to Build a Dragon”
○ https://www.meetup.com/Category-Theory
○ Release Dragon!
○ http://bit.ly/release_dragon
○ Gremlin-users
○ https://groups.google.com/g/gremlin-users
@KGConference
linkedin.com/company/the-knowldge-graph-conference/
youtube.com/playlist?list=PLAiy7NYe9U2Gjg-600CTV1HGypiF95d_D
@joshsh
Proprietary © 2021 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any
form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval
systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to
whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under
applicable law. All recipients of this document are notified that the information contained herein includes proprietary and
confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any
of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with
authorized personnel of Uber.

More Related Content

What's hot

PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slidesDat Tran
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mark Kromer
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsJean-Paul Calbimonte
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityJoshua Shinavier
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframeJaemun Jung
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudRichard Cyganiak
 
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew RayData Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew RayDatabricks
 
Introduction to GraphQL: Mobile Week SF
Introduction to GraphQL: Mobile Week SFIntroduction to GraphQL: Mobile Week SF
Introduction to GraphQL: Mobile Week SFAmazon Web Services
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Seunghyun Lee
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...Spark Summit
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Test Data Management 101—Featuring a Tour of CA Test Data Manager (Formerly G...
Test Data Management 101—Featuring a Tour of CA Test Data Manager (Formerly G...Test Data Management 101—Featuring a Tour of CA Test Data Manager (Formerly G...
Test Data Management 101—Featuring a Tour of CA Test Data Manager (Formerly G...CA Technologies
 

What's hot (20)

PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slides
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew RayData Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
 
Introduction to GraphQL: Mobile Week SF
Introduction to GraphQL: Mobile Week SFIntroduction to GraphQL: Mobile Week SF
Introduction to GraphQL: Mobile Week SF
 
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
Pinot: Realtime OLAP for 530 Million Users - Sigmod 2018
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Catalyst optimizer
Catalyst optimizerCatalyst optimizer
Catalyst optimizer
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Data partitioning
Data partitioningData partitioning
Data partitioning
 
Test Data Management 101—Featuring a Tour of CA Test Data Manager (Formerly G...
Test Data Management 101—Featuring a Tour of CA Test Data Manager (Formerly G...Test Data Management 101—Featuring a Tour of CA Test Data Manager (Formerly G...
Test Data Management 101—Featuring a Tour of CA Test Data Manager (Formerly G...
 

Similar to Build Graphs from Diverse Data Sources

HPEC 2021 sparse binary format
HPEC 2021 sparse binary formatHPEC 2021 sparse binary format
HPEC 2021 sparse binary formatErikWelch2
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinthsDaniel Camarda
 
Substrait Overview.pdf
Substrait Overview.pdfSubstrait Overview.pdf
Substrait Overview.pdfRinat Abdullin
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Aaron Saray
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
Ballerina Tutorial @ SummerSOC 2019
Ballerina Tutorial @ SummerSOC 2019Ballerina Tutorial @ SummerSOC 2019
Ballerina Tutorial @ SummerSOC 2019kshanth2101
 
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...Jean Ihm
 
Comparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypherComparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypheropenCypher
 
Overview of no sql
Overview of no sqlOverview of no sql
Overview of no sqlSean Murphy
 
Graph analysis over relational database
Graph analysis over relational databaseGraph analysis over relational database
Graph analysis over relational databaseGraphRM
 
Gremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphGremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphStephen Mallette
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Ontotext
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetupJoshua Bae
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriDemi Ben-Ari
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022ArangoDB Database
 

Similar to Build Graphs from Diverse Data Sources (20)

HPEC 2021 sparse binary format
HPEC 2021 sparse binary formatHPEC 2021 sparse binary format
HPEC 2021 sparse binary format
 
Data quality in Real Estate
Data quality in Real EstateData quality in Real Estate
Data quality in Real Estate
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinths
 
Substrait Overview.pdf
Substrait Overview.pdfSubstrait Overview.pdf
Substrait Overview.pdf
 
Brett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4jBrett Ragozzine - Graph Databases and Neo4j
Brett Ragozzine - Graph Databases and Neo4j
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
Ballerina Tutorial @ SummerSOC 2019
Ballerina Tutorial @ SummerSOC 2019Ballerina Tutorial @ SummerSOC 2019
Ballerina Tutorial @ SummerSOC 2019
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...
 
Comparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and CypherComparing PGQL, G-Core and Cypher
Comparing PGQL, G-Core and Cypher
 
Overview of no sql
Overview of no sqlOverview of no sql
Overview of no sql
 
Graph analysis over relational database
Graph analysis over relational databaseGraph analysis over relational database
Graph analysis over relational database
 
Gremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise GraphGremlin Queries with DataStax Enterprise Graph
Gremlin Queries with DataStax Enterprise Graph
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 

More from Joshua Shinavier

In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)Joshua Shinavier
 
In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)Joshua Shinavier
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Joshua Shinavier
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsJoshua Shinavier
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.orgJoshua Shinavier
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsJoshua Shinavier
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsJoshua Shinavier
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsJoshua Shinavier
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked DataJoshua Shinavier
 

More from Joshua Shinavier (11)

In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)
 
In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBs
 
Semantics and Sensors
Semantics and SensorsSemantics and Sensors
Semantics and Sensors
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.org
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
 
Linked Process
Linked ProcessLinked Process
Linked Process
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 chars
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked Data
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Build Graphs from Diverse Data Sources

  • 1. Anything-to-Graph Knowledge Graph Conference May 2021 Joshua Shinavier, PhD ( : joshsh) Data @
  • 2. Overview ○ Building graphs ○ Models ○ Mappings ○ Use cases ○ TinkerPop 4
  • 4. There are graphs, and graphs ○ Domain-specific graphs are simplest ○ ETL a few data sources into a graph ○ Mappings can be written and maintained by hand ○ Off-the-shelf tools work well ○ Difficulty increases with ○ Complexity of source schemas ○ Diversity of ownership, quality, and governance in data sources ○ Lack of standardization on languages and vocabulary ○ Some challenges are organizational, others technical ○ Organizational: see Lessons From Reality (KGC 2019) ○ Technical: let’s take a closer look
  • 5. The challenges of heterogeneity ○ Diverse data sources → need supporting metadata ○ E.g. see Uber’s Databook ○ Diverse data governance → need better data isolation ○ Typically, RDF triple stores do better than PG databases here ○ Diverse domain data models → need standardized schemas ○ E.g. see Uber’s Data Standardization ○ Diverse schema and data languages → need well-defined mappings ○ Enter the Dragon
  • 6. Prerequisites ○ Data must conform to a schema ○ It is OK to have a mix of schemas, and schema languages ○ Unique identifiers must be clearly distinguished, and typed ○ What is this UUID field? Does it identify a User, a Document, etc.? ○ Some degree of standardization is needed ○ Are identifiers consistent across data sources? ○ Can this timestamp value be compared with that timestamp value? Etc. ○ Need well-defined mappings ○ From each data language into a graph format ○ From each schema language into a graph schema language ○ ...without losing too much information ○ ...and while maintaining consistency between schema and data
  • 8. Is there a “universal data model” for graphs? ○ Desirable characteristics ○ Centrality: ease of alignment with other graph and non-graph models ○ Flexibility: captures a wide variety of graph structures ○ Formality: serves as a tool for reasoning about data models ○ Practicality: serves as a basis for inference, query optimization, etc. ○ Intuitiveness: a good graph schema captures our mental model ○ Where can the right abstractions be found? ○ Logics? Set theory? Algebras? Category theory? ○ Does the data model already exist? ○ What about RDF-based languages (RDFS, OWL, SHACL, ShEx, etc.)? ○ Is that what GQL is going to be?
  • 9. Graph features are very diverse Spreadsheet by Gábor Szárnyas
  • 10. Complexity is not the answer ○ No one language will ever satisfy all graph use cases ○ Implementing the model must be simple, or developers won’t ○ Who remembers CODASYL? Why aren’t we using it? ○ Go for minimalism + extensibility ○ Dimensions being explored for GQL ○ Mathematical foundations ○ Nominal vs. structural typing ○ Inheritance vs. composition ○ Graph element types vs. property types ○ Descriptive vs. prescriptive schemas
  • 11. A little algebra goes a long way ○ Algebraic Property Graphs1 ○ Pragmatic and opinionated graph data model ○ Schemas are vertex/edge/etc. labels bound to algebraic data types ○ Formally defined using category theory ○ Effectively a subset of SHACL ○ Ideally, GQL will also be a superset [1] https://arxiv.org/abs/1909.04881
  • 13. Desirable properties ○ Composability ○ If we have f : A → B and g : B → C, we should also have f;g : A → C ○ Bidirectionality ○ If we have f : A → B, we should also have f-1 : B → A, with f;f-1 ≅ f-1 ;f ≅ id ○ Data : schema consistency ○ Mappings should be defined for data and schemas languages in parallel ○ If data d conforms to schema s, then we need f(d) to conform to f(s)
  • 14. Topology of mappings ○ Arbitrary / ad-hoc topology ○ Pros: seems easiest; just use the pairwise mappings you have ○ Cons: more indirection, and mappings may not compose well ○ Star topology ○ Pros: mappings compose well, and all paths are short ○ Cons: need to define/identify a central data model (the “dragon”)
  • 15. Transforming schemas and data ○ Schema-level mappings ○ Transformations on data types ○ Data-level mappings ○ Transformations on instances of data types ○ Schema and data-level mappings must be consistent ○ a :: A ⇒ F(a) :: F(A) ○ Use common intermediate representations
  • 16. Dragon ○ Framework for data and schema migration ○ Developed for Uber’s Data Standardization ○ Dragon data model ○ Extension of Algebraic Property Graphs ○ Covers the majority of schemas at Uber ○ Sources and targets ○ RPC / interface languages: Protocol Buffers, Apache Thrift, Apache Avro ○ RDF-based languages: SHACL, OWL ○ “Schemaless” formats: YAML, JSON ○ Programming languages: Haskell, Java, Scala ○ Implementations in Haskell and Java ○ Dragon generates over half of its own source code (from YAML)
  • 17. $ dragon transform -i Protobuf -o SHACL $SCHEMAS -d maps /tmp/shacl ______. ______/ .______ Dragon 0.3.4 ----___ O .. O ___---- ---------------------vvv----vvvv----vvv--------------------- Loading configuration from dragon.yaml Reading from Protobuf starting at maps in base directory /Users/joshsh/projects/uber/example/idl Found 3 *.proto sources in /Users/joshsh/projects/uber/example/idl/maps Visiting 3 files (total of 0 so far): /Users/joshsh/projects/uber/example/idl/maps/coordinates.proto /Users/joshsh/projects/uber/example/idl/maps/geometry.proto /Users/joshsh/projects/uber/example/idl/maps/position.proto Visiting 3 files (total of 3 so far): /Users/joshsh/projects/uber/example/idl/physics/units.proto /Users/joshsh/projects/uber/example/idl/time/duration.proto /Users/joshsh/projects/uber/example/idl/time/epoch.proto Instantiated a graph of 12 schemas Note 1 alert: info: unsigned integers not supported for RDF targets. Using signed integers Writing 12 SHACL artifacts to /tmp/shacl
  • 19. JSON-to-Graph ○ Graph data formats are used nowhere in the enterprise ○ RDF serialization formats, GraphML, etc. are a hard sell ○ JSON and YAML are used everywhere ○ Give the JSON or YAML a schema, then treat it like a serialized graph ○ E.g. Databook1 snapshots to RDF [1] https://eng.uber.com/metadata-insights-databook
  • 20. gRPC-to-Graph ○ Don’t let developers write individual edges / triples to the graph ○ Schema constraints are harder to enforce, errors are harder to understand ○ Transform graph schemas into developer-facing schemas ○ Use familiar schema languages and protocols like Protobuf + gRPC ○ Transform payload messages into graph data ○ Schema and data transformations guarantee constraint satisfaction
  • 21. Enriching domain schemas ○ Need a little more than “just Protobuf” / “just Thrift” to build a big graph ○ E.g. entity / identifier types need to be widely re-used ○ Start with a graph schema (e.g. in YAML) ○ Propagate logical data types into domain schema languages ○ At Uber: Protobuf, Thrift, Avro ○ Re-use the logical types in many domain schemas ○ Incentivize re-use, and use schema transformations for metrics ○ Now, graph-friendly types are everywhere
  • 23. ○ Alibaba Graph Database ○ Amazon Neptune ○ ArangoDB ○ Bitsy ○ Blazegraph ○ CosmosDB ○ ChronoGraph ○ DSEGraph ○ GRAKN.AI ○ Hadoop (Spark) ○ HGraphDB ○ Huawei Graph Engine Service ○ IBM Graph ○ JanusGraph ○ Neo4j ○ neo4j-gremlin-bolt ○ OrientDB ○ Apache S2Graph ○ Sqlg ○ Stardog ○ TinkerGraph ○ Titan ○ Titan + Tupl ○ Unipop Dozens of graph systems
  • 24. ○ Clojure: ogre ○ Cypher: cypher-for-gremlin ○ Elixir: gremlex ○ Go: grammes, gremgo ○ Haskell: greskell, gremlin-haskell ○ Java: Ferma, gremlin-objects, Peapod, spring-data-gremlin, gremlin-driver ○ JavaScript: gremlin-javascript, gremlin-orm, gremlin-template-string ○ Kotlin: kotlin-gremlin-ogm ○ .NET: Gremlin.Net, Gremlinq ○ PHP: gremlin-php ○ Python: Goblin, gremlin-python, gremlin-py, ipython-gremlin, gremlinclient, gremlin-python, JUGRI, gremlinrestclient, python-gremlin-rest ○ Ruby: gremlin_client ○ Rust: gremlin-rs ○ Scala: gremlin-scala, reactive-gremlin, scalajs-gremlin-client ○ SPARQL: sparql-gremlin ○ SQL: sql-gremlin ○ Typescript: ts-tinkerpop Dozens of programming languages
  • 25. Interoperability via mappings ○ We need parity across languages and systems ○ Make it easier to create new Gremlin language variants in a consistent way ○ “Escape from the JVM” ○ Code generation can help ○ Generate graph APIs consistently into many programming languages ○ Generate grammars for parsing/validation
  • 26. A unifying schema language ○ A few vendor-specific schema languages exist ○ Weak and disconnected from each other ○ Also note TinkerPop’s Graph.Features ○ Need a real, vendor-neutral schema language ○ Facilitate composability of data and queries ○ Enable inference for type safety and optimizations ○ Propagate logical schemas into multiple graph back-ends ○ Build vendor-agnostic tools for data migration
  • 27. Serialization formats for graphs ○ Currently serialization options are limited ○ GraphML (XML), GraphSON (JSON), GraphBinary, Gryo (Kryo) ○ Generic JSON and YAML are straightforward ○ What about common RPC formats? ○ Protobuf, Thrift, Avro, etc. ○ What about RDF serialization formats? ○ N-Triples, Turtle, JSON-LD, etc. ○ Others? ○ Parquet? CBOR? FlatBuffers? MessagePack? Etc.
  • 28. Stay tuned ○ “How to Build a Dragon” ○ https://www.meetup.com/Category-Theory ○ Release Dragon! ○ http://bit.ly/release_dragon ○ Gremlin-users ○ https://groups.google.com/g/gremlin-users @KGConference linkedin.com/company/the-knowldge-graph-conference/ youtube.com/playlist?list=PLAiy7NYe9U2Gjg-600CTV1HGypiF95d_D @joshsh
  • 29. Proprietary © 2021 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber.