Graph databases address one of the great macroscopic business trends of today: leveraging complex and dynamic relationships in highly connected data to generate insight and competitive advantage. Whether we want to understand relationships between customers, elements in a telephone or data center network, entertainment producers and consumers, or genes and proteins, the ability to understand and analyze vast graphs of highly connected data will be key in determining which companies outperform their competitors over the coming decade. In this session, I am going to cover following graph database concepts mainly w.r.t Neo4j.
High level view of Graph Space
Power of Graph Databases
Data Modeling with Graphs
Cypher : Graph Query language
Building a Graph Database Application
Graphs in Real World / Common Use cases
Predictive Analysis with Graph Theory
3. Agenda
High level view of Graph Space
Comparison with RDBMS and other NoSQL
stores
Data Modeling
Cypher : Graph Query Language
Graph Database Internals
Graphs In Real World
Xebia India
3
6. What is a Graph?
A collection of vertices and edges.
Set of nodes and the relationships that connect
them.
Graph Represents
Entities as NODES
The way those entities relate to the world as
RELATIONSHIP
Allows to model all kind of scenarios
System of road
Medical history
Supply chain management
Data Center
Xebia India
6
7.
8.
9. High Level view of Graph Space
Graph Databases - Technologies used primarily
for transactional online graph persistence –
OLTP.
Graph Compute Engines - Tecnologies used
primarily for offline graph analytics - OLAP.
Xebia India
9
10. Graph Databases
Online database management system with Create, Read, Update, Delete
methods that expose a graph data model.
Built for use with transactional (OLTP) systems.
Used for richly connected data.
Querying is performed through traversals.
Can perform millions of traversal steps per
second.
Traversal step resembles a join in a RDBMS
Xebia India
10
11. Graph Database Properties
The Underlying Storage : Native / Non-Native
The Processing Engine : Native / Non-Native
Xebia India
11
12. Graph DB – The Underlying Storage
Native Graph Storage – Optimized and designed
for storing and managing graphs.
Non-Native Graph Storage – Serialize the graph
data into a relational database, an object oriented
database, or some other general purpose data
store.
Xebia India
12
13.
14. Graph DB – The processing Engine
Index free adjacency – Connected Nodes
physically point to each other in the database
Xebia India
14
15.
16.
17.
18. Power of Graph Databases
Performance
Flexibility
Agility
Xebia India
18
20. Relational Databases Lack
Relationships
Initially designed to codify paper forms and
tabular structures.
Deal poorly with relationships.
The rise in connectedness translates into
increased joins.
Lower performance.
Difficult to cater for changing business needs.
Xebia India
20
21.
22.
23. NoSQL Databases also lack
Relationships
NOSQL Databases e.g key-value, document or
column oriented store sets of disconnected
values/documents/columns.
Makes it difficult to use them for connected data
and graphs.
One of the solution is to embed an aggregate's
identifier inside the field belonging to another
aggregate.
Effectively introducing foreign keys
Requires joining aggregates at the application
level.
Xebia India
23
24. NoSQL DB
Relationships between aggregates aren't first
class citizens in the data model.
Foreign aggregate "links" are not reflexive.
Need to use some external compute infrastructure
e.g Hadoop for such processing.
Do not maintain consistency of connected data.
Do not support index-free adjacency.
Xebia India
24
25.
26.
27. Graph DB
Find friends-of-friends in a social network, to a
maximum depth of 5.
Total records : 1,000,000
Each with approximately 50 friends
Xebia India
27
30. Data Modeling
“Whiteboard” friendly
The typical whiteboard view of a problem is a
GRAPH.
Sketch in our creative and analytical
modes, maps closely to the data model inside the
database.
Xebia India
30
31.
32. Cypher : Graph Query Language
Pattern-Matching Query Language
Humane language
Expressive
Declarative : Say what you want, now how
Borrows from well know query languages
Aggregation, Ordering, Limit
Update the Graph
Xebia India
32
35. Other Cypher Clauses
WHERE
CREATE and CREATE UNIQUE
Create nodes and relationships
DELETE
Provides criteria for filtering pattern matching
results.
Removes nodes, relationships and properties
SET
Sets property values
Xebia India
35
36. Other Cypher Clauses
FOREACH
UNION
Performs an updating action for graph element in
a list.
Merge results from two or more queries.
WITH
Chains subsequent query parts and forward
results from one to the next. Similar to piping
commands in UNIX.
Xebia India
36
46. Capacity
1.9 Release of Neo4j can support single graphs
having 10s of billions of nodes, relationships
and properties.
The Neo4j team has publicly expressed the
intention to support 100B+
nodes/relationships/properties in a single
graph.
Xebia India
46
47. Latency
RDBMS – more data in tables/indexes result in
longer join operations.
Graph DB doesn't suffer the same latency
problem.
Index is used to find starting node.
Traversal uses a combination of pointer chasing
and pattern matching to search the data.
Performance does not depend on total size of the
dataset.
Depends only on the data being queried.
Xebia India
47
50. Common Use Cases
Social
Recommendations
Geo
Logistics Networks : for package routing, finding shortest
Path
Financial Transaction Graphs : for fraud detection
Master Data Management
Bioinformatics : Era7 to relate complex web of information
that includes genes, proteins and enzymes
Authorization and Access Control : Adobe Creative
Cloud, Telenor
Xebia India
50