Key-value store projects have been widely adopted as a way to store metadata, but also as a low-level construct on top of which can be built more advanced storage solutions from file systems, object storage APIs and more. Unfortunately, most key-value store constructs suffer the same limitations when it comes to scalability, performance, and resilience. Infinit's key-value store takes a different approach, relying on a decentralized architecture rather than a master/slave model while offering strong consistency.
3. Key-value stores have increasingly gained in popularity as a fundamental
layer for storing and sharing content in a distributed system.
Such a construct can be used (and has been extensively) to manage:
• Metadata
• Logs
• etc.
The most well-known key-value stores today are etcd, ZooKeeper and
Consul.
Introduction
4. Why the hell yet another key-value
store?
1. Analysis
5. Depending on the use case, the requirements on the underlying key-value
store vary on several levels:
• Scalability
• Resilience
• Performance
• Security
• Consistency
We believe that the community needs a key-value store for the all the other
applications.
Problem
6. The main problem comes from the distribution mechanism which is based on
a manager/worker model.
Model
manager manager
worker
worker
worker
7. This model, even though well-suited to many use cases, suffers from its
design on many levels:
• Scalability: limited by the scalability of manager nodes
• Resilience: overflow could lead cascading effects
• Performance: limited capability to handle slaves and clients’ requests
• Security: managers are ideal targets
• Consistency: impossibility to handle many parallel update requests
Limitations
8. So what makes it different (apart
from not having a name, yet!)?
2. Introducing Infinit’s
key-value store
9. What makes it different from the etcd, Consul, ZooKeeper and other key-
value stores is the use of a decentralized model (i.e peer-to-peer) where
every node is equipotent.
Presentation
node
node
node
10. Such a decentralized architecture is naturally adapted to scaling since nodes
can join and leave without a need to keep track of them through a central
directory.
Instead, the directory is collectively managed by the cluster through
algorithms known as an overlay network (routing requests to the right
nodes) and distributed hash table or DHT (redundancy, self-healing etc.).
BONUS: Infinit’s key-value store can be deployed over a single-node cluster,
something that is not possible for the manager-worker-based systems.
Scalability
11. Systems based on the manager/worker model need to dimension the system
in order to support the worker nodes and handle clients’ requests.
On the contrary, a decentralized architecture does away with bottlenecks,
single points of failures and the associated slow performance since requests
are not concentrated on a small subset of critical manager nodes.
Even more, the more nodes in the system, the faster requests will be
processed because the load is naturally distributed between all the nodes.
Resilience & Performance
12. Unlike in manager-worker-model-based systems, there is no authoritative
node nor more privileged nodes in a decentralized architecture.
As such, an attacker would have no choice but to either take control of a
large portion of the nodes composing the cluster or to find a breach in the
network protocols in order to attack the system.
Security
13. Consistency is all about reaching consensus within
the set of servers that host the replicas of a
piece of data.
Distributed systems based on a manager/worker
model rely on the managers to maintain
consistency whenever an update is requested.
Due to the concentration of such requests on the
managers, the number of parallel requests that
can be processed is limited.
Consistency1/4
DISTRIBUTED
(manager/worker)
managerleader
worker
14. Infinit’s key-value store instead relies on block-
based quorums, meaning there are as many
quorums, hence potential parallel consensus run,
as there are blocks (a.k.a values) in the system.
This approach means that parallel requests are
handled by disjoint quorums, leading to better
performance, security and fault tolerance.
Consistency2/4
DECENTRALIZED
(peer-to-peer)
node
15. Consistency3/4
Block-based quorums also means the complexity of the consensus algorithm
is function of the redundancy factor, not the number of nodes in the cluster
(unlike manager/worker systems).
In other words, in a manager/worker model, a cluster of 1 million worker
nodes would require, say 100 managers. Given the quadratic complexity of
consensus algorithms, a consensus among that many nodes would take
several seconds to be reached.
In a decentralized architecture, the complexity remains the same no matter
the number of nodes in the network.
16. Most distributed systems nowadays rely on Raft for consensus. However,
because Raft generates a lot of noise and because it is impractical in
systems that can have millions of quorums, we have decided to use Paxos.
Also, because the key-value store is a fundamental layer, Infinit’s is strongly
consistent to allow for more demanding applications.
NOTE: the consensus algorithm can be customized to switch to another one
with different consistency guarantees.
Consistency4/4
17. As a summary, Infinit key-value store’s decentralized architecture brings a
number of advantages over manager-worker-based distributed systems.
This model offers better performance, security and resilience by removing
the critical manager nodes.
Also, coupled with a block-based quorums, such a model allows for extremely
scalable applications.
Conclusion
19. Infinit’s key-value store differs from other key-value stores in two major
ways:
• Key: one cannot choose the key associated with the values it stores;
Infinit’s key-value store generates an address so as to optimize data
placement for load balancing, fault tolerance and more.
• Value: in Infinit’s, there are different types of value (known as blocks),
each with their tradeoffs.
Overview
20. In order to properly use Infinit’s key-value store, one needs to perfectly
understand the various block types. In its purest form, there are two types
of blocks, on top of which many other can be created:
• Mutable Blocks: costly, subject to conflicts, need consensus when updated,
need to invalidate their cache to refresh the value etc.
• Immutable Blocks (content hashing): cannot conflict, can be cached
forever, can be fetched from any source (integrity easy to validate) etc.
Blocks
21. Infinit key-value store’s API is composed of two types of calls:
• Block Generation:
• MakeImmutableBlock() -> (Address, Block)
• MakeMutableBlock() -> (Address, Block)
• Key-Value Store Manipulation:
• Insert(Block) -> Boolean
• Update(Block) -> Boolean
• Remove(Address) -> Boolean
• Fetch(Address) -> Block
API
23. def connect(endpoint):
import grpc
import doughnut_pb2_grpc
channel = grpc.insecure_channel(endpoint)
return doughnut_pb2_grpc.DoughnutStub(channel)
def init(kv):
# Create a mutable block representing the index
index = kv.MakeMutableBlock(MakeMutableBlockRequest())
# Set its payload to an emtpy list
index.data_plain = pickle.dumps([])
# Insert the block
kv.Insert(InsertRequest(block = index))
# Return its address
return index.address.hex()
def index(kv, addr):
return kv.Fetch(FetchRequest(address = unhexlify(addr),
decrypt_data = True)).block
Example
index
MutableBlock
image
ImmutableBlock
24. def add(kv, addr, content):
idx = index(kv, addr)
# Create the content mutable block
content_block = kv.MakeImmutableBlock(
MakeImmutableBlockRequest(data = pickle.dumps(content)))
# Append the address to the index
l = pickle.loads(idx.data_plain)
l.append(content_block.address)
idx.data_plain = pickle.dumps(l)
# Update the index
update = kv.Update(UpdateRequest(block = idx))
# Push the content block
kv.Insert(InsertRequest(block = content_block))
Example