Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocquet, Docker

Julien Quintard
Technical Staff, Docker
Quentin Hocquet
Software Engineer, Docker
Infinit’s Next Generation Key-Value Store

1. Analysis
Agenda
2. Introducing Infinit’s key-value store
3. API
4. Demo

Key-value stores have increasingly gained in popularity as a fundamental
layer for storing and sharing content in a distributed system.
Such a construct can be used (and has been extensively) to manage:
• Metadata
• Logs
• etc.
The most well-known key-value stores today are etcd, ZooKeeper and
Consul.
Introduction

Why the hell yet another key-value
store?
1. Analysis

Depending on the use case, the requirements on the underlying key-value
store vary on several levels:
• Scalability
• Resilience
• Performance
• Security
• Consistency
We believe that the community needs a key-value store for the all the other
applications.
Problem

The main problem comes from the distribution mechanism which is based on
a manager/worker model.
Model
manager manager
worker
worker
worker

This model, even though well-suited to many use cases, suffers from its
design on many levels:
• Scalability: limited by the scalability of manager nodes
• Resilience: overflow could lead cascading effects
• Performance: limited capability to handle slaves and clients’ requests
• Security: managers are ideal targets
• Consistency: impossibility to handle many parallel update requests
Limitations

So what makes it different (apart
from not having a name, yet!)?
2. Introducing Infinit’s
key-value store

What makes it different from the etcd, Consul, ZooKeeper and other key-
value stores is the use of a decentralized model (i.e peer-to-peer) where
every node is equipotent.
Presentation
node
node
node

Such a decentralized architecture is naturally adapted to scaling since nodes
can join and leave without a need to keep track of them through a central
directory.
Instead, the directory is collectively managed by the cluster through
algorithms known as an overlay network (routing requests to the right
nodes) and distributed hash table or DHT (redundancy, self-healing etc.).
BONUS: Infinit’s key-value store can be deployed over a single-node cluster,
something that is not possible for the manager-worker-based systems.
Scalability

Systems based on the manager/worker model need to dimension the system
in order to support the worker nodes and handle clients’ requests.
On the contrary, a decentralized architecture does away with bottlenecks,
single points of failures and the associated slow performance since requests
are not concentrated on a small subset of critical manager nodes.
Even more, the more nodes in the system, the faster requests will be
processed because the load is naturally distributed between all the nodes.
Resilience & Performance

Unlike in manager-worker-model-based systems, there is no authoritative
node nor more privileged nodes in a decentralized architecture.
As such, an attacker would have no choice but to either take control of a
large portion of the nodes composing the cluster or to find a breach in the
network protocols in order to attack the system.
Security

Consistency is all about reaching consensus within
the set of servers that host the replicas of a
piece of data.
Distributed systems based on a manager/worker
model rely on the managers to maintain
consistency whenever an update is requested.
Due to the concentration of such requests on the
managers, the number of parallel requests that
can be processed is limited.
Consistency1/4
DISTRIBUTED
(manager/worker)
managerleader
worker

Infinit’s key-value store instead relies on block-
based quorums, meaning there are as many
quorums, hence potential parallel consensus run,
as there are blocks (a.k.a values) in the system.
This approach means that parallel requests are
handled by disjoint quorums, leading to better
performance, security and fault tolerance.
Consistency2/4
DECENTRALIZED
(peer-to-peer)
node

Consistency3/4
Block-based quorums also means the complexity of the consensus algorithm
is function of the redundancy factor, not the number of nodes in the cluster
(unlike manager/worker systems).
In other words, in a manager/worker model, a cluster of 1 million worker
nodes would require, say 100 managers. Given the quadratic complexity of
consensus algorithms, a consensus among that many nodes would take
several seconds to be reached.
In a decentralized architecture, the complexity remains the same no matter
the number of nodes in the network.

Most distributed systems nowadays rely on Raft for consensus. However,
because Raft generates a lot of noise and because it is impractical in
systems that can have millions of quorums, we have decided to use Paxos.
Also, because the key-value store is a fundamental layer, Infinit’s is strongly
consistent to allow for more demanding applications.
NOTE: the consensus algorithm can be customized to switch to another one
with different consistency guarantees.
Consistency4/4

As a summary, Infinit key-value store’s decentralized architecture brings a
number of advantages over manager-worker-based distributed systems.
This model offers better performance, security and resilience by removing
the critical manager nodes.
Also, coupled with a block-based quorums, such a model allows for extremely
scalable applications.
Conclusion

Infinit’s key-value store differs from other key-value stores in two major
ways:
• Key: one cannot choose the key associated with the values it stores;
Infinit’s key-value store generates an address so as to optimize data
placement for load balancing, fault tolerance and more.
• Value: in Infinit’s, there are different types of value (known as blocks),
each with their tradeoffs.
Overview

In order to properly use Infinit’s key-value store, one needs to perfectly
understand the various block types. In its purest form, there are two types
of blocks, on top of which many other can be created:
• Mutable Blocks: costly, subject to conflicts, need consensus when updated,
need to invalidate their cache to refresh the value etc.
• Immutable Blocks (content hashing): cannot conflict, can be cached
forever, can be fetched from any source (integrity easy to validate) etc.
Blocks

Infinit key-value store’s API is composed of two types of calls:
• Block Generation:
• MakeImmutableBlock() -> (Address, Block)
• MakeMutableBlock() -> (Address, Block)
• Key-Value Store Manipulation:
• Insert(Block) -> Boolean
• Update(Block) -> Boolean
• Remove(Address) -> Boolean
• Fetch(Address) -> Block
API

Ok, now show me the money!
4. Demo

def connect(endpoint):
import grpc
import doughnut_pb2_grpc
channel = grpc.insecure_channel(endpoint)
return doughnut_pb2_grpc.DoughnutStub(channel)
def init(kv):
# Create a mutable block representing the index
index = kv.MakeMutableBlock(MakeMutableBlockRequest())
# Set its payload to an emtpy list
index.data_plain = pickle.dumps([])
# Insert the block
kv.Insert(InsertRequest(block = index))
# Return its address
return index.address.hex()
def index(kv, addr):
return kv.Fetch(FetchRequest(address = unhexlify(addr),
decrypt_data = True)).block
Example
index
MutableBlock
image
ImmutableBlock

def add(kv, addr, content):
idx = index(kv, addr)
# Create the content mutable block
content_block = kv.MakeImmutableBlock(
MakeImmutableBlockRequest(data = pickle.dumps(content)))
# Append the address to the index
l = pickle.loads(idx.data_plain)
l.append(content_block.address)
idx.data_plain = pickle.dumps(l)
# Update the index
update = kv.Update(UpdateRequest(block = idx))
# Push the content block
kv.Insert(InsertRequest(block = content_block))
Example

def add(kv, addr, content):
content['conflicts'] = 0
idx = index(kv, addr)
while True:
content_block = kv.MakeImmutableBlock(
MakeImmutableBlockRequest(data = pickle.dumps(content)))
l = pickle.loads(idx.data_plain)
l.append(content_block.address)
idx.data_plain = pickle.dumps(l)
time.sleep(random.random() * 0.1)
update = kv.Update(UpdateRequest(block = idx))
if update.error == SUCCESS:
break
elif update.error == CONFLICT::
content['conflicts'] = content['conflicts'] + 1
idx = update.current
else:
raise Exception(update.error)
kv.Insert(InsertRequest(block = content_block))
Example

endpoint = sys.argv[1]
address = sys.argv[2]
kv = connect(endpoint)
while True:
images = requests.get('https://api.imgur.com/3/gallery/random/random/0.json')
for image in images:
content = requests.get(image['link'], headers = headers).content
add(kv, address,
{
'img': content,
'host': socket.gethostname(),
})
time.sleep(random.random())
Example: writer

endpoint = sys.argv[1]
address = sys.argv[2]
kv = connect(endpoint)
from wsgiref.simple_server import make_server
def simple_app(environ, start_response):
status = '200 OK'
headers = [('Content-type', 'text/html')]
start_response(status, headers)
i = 0
images = pickle.loads(index(kv, address).data_plain)
for l in images[-32:]:
data = pickle.loads(kv.Fetch(FetchRequest(address = l)).block.data)
img = base64.b64encode(data['img']).decode('latin-1')
yield '''<img src="data:{}"/> Host: {} Conflicts: {}<br/>'''.format(
img, data['host'], data['conflicts'])
make_server(8000, simple_app).serve_forever()
Example: reader

Q&A
http://infinit.sh
contact@infinit.sh
@docker #dockercon

Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocquet, Docker

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocquet, Docker

Similar to Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocquet, Docker (20)

More from Docker, Inc.

More from Docker, Inc. (20)

Recently uploaded

Recently uploaded (20)

Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocquet, Docker