The document discusses scalability in distributed data systems and mechanisms to address it. It introduces concepts like participant discovery, endpoint discovery, and acknacks that can impact scalability as systems grow. To reduce these effects, it recommends separating components in time and space through controlled startup, delayed acknacks, system partitioning, and using a hub-and-spoke architecture with data forwarding instead of a flat network. Experiments show the hub-and-spoke approach completes faster and with less busy processing than a flat network as the number of components increases.
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Scalability techniques for large DDS systems
1. Scalability
Dealing with large systems
Lex Heerink, PhD
software architect
Research & Development | ZettaScale Technology
2. Goal
What
• Give insight in some mechanisms that impact scalability
• Provide some options to deal with scalability
How
• Brief intro into CycloneDDS and scalability
• Nose dip in discovery aspects
• Things you could do to address scalability
• Example: hub-and-spoke architecture
From the specs
The need to scale to hundreds or
thousands of publishers and
subscribers in a robust manner is
also an important requirement.
[OMG DDS spec 1.4]
“Another important requirement
is the need to scale to hundreds
or thousands of subscribers in a
robust fault-tolerant manner”.
[OMG RTPS spec 2.2]
3. About CycloneDDS
DDS is a standards-based technology for ubiquitous, interoperable,
platform independent and real-time data sharing across network
connected devices
Characteristics: publish/subscribe technology, data centric, fault
tolerant, no single point of failure, reliable
Key concepts: participants, topics, endpoints, partitions
Applied in systems with above average availability and reliability
demands. Mandated in aerospace and defense.
CycloneDDS is an open source and freely available DDS
implementation: https://cyclonedds.io/
Data space
topicA
topicB
qos
qos
DDS concepts
W
R
R
W
R
4. Structure of the data (“topics”) can be modelled in IDL, and
properties can be assigned to data that specify how the data space
should treat the data
Data spaces can be partitioned in independent data planes.
Each plane can get hold on the data for different purposes, and data
can be created/modified/updated/disposed in each of these planes
independently of the other planes.
R
W
R
W
topic
R
topic
Data space partitioning
W
W
black partition
red partition
@appendable
struct car {
@key string license_plate; /* key of the topic */
brandtype brand; /* non-key field */
int color;
};
System partitioning
5. Scalability in the context of DDS is about the behaviour of the system
when increasing the number of participants, the number of topics,
the number of readers/writers, etc.
Obviously, the way you model you data, the data rate at which you
publish the data, and the distribution of the publishers and
subscribers may impact scalability.
We’ll assume that you made smart choices in data modelling, and
primarily focus on the discovery aspects related to publishers and
subscribers.
Dealing with large systems
About scalability
6. Discovery in DDS
DDSI-RTPS is the wire protocol used by DDS. It is
designed to run over multicast and best-effort
connectionless transports such as UDP/IP.
Used for discovery of remote participants, readers
and writers, so that data can delivered
• SPDP: participant discovery, periodic, best-effort
• SEDP: endpoint discovery, transient-local, reliable
participant
endpoint
writer
participant
endpoint
reader writer
reader
participant
discovery
endpoint
discovery
7. Participants periodically announce their presence (best-
effort, multicast (default)).
If a participant discovers a new participant, then it responds
by sending by sending a unicast message back. A participant
that discovers N new participants therefore receives and
processes N replies.
The messages carry locator information of the builtin
readers/writers for a participant. These are needed to
kickstart endpoint discovery.
→ Responses of participant discovery scales quadratically
with the number of participants
Participant discovery
participant
Participant discovery
8. Endpoint discovery
participant A
Data space
topicA
topicB
W
participant B
participant C
R
R
R
W
participant D
R
R
participant E
R
R
Builtin endpoints exchange relevant info about
topics, qos, readers and writers
- published as reliable, transient-local data
Writers can request matching readers to send
acknack back to the writer.
- published as directed, UDP messages
- single writer vs. many readers lead to large fan-
in of acknacks
- large fan-in may lead to high processing load for
the writer
Endpoint discovery scales quadratically with the number
of matching endpoints
acknack
9. Acknacks
An acknack is sent by a reader to a matching writer, so that
the writer can determine if the reader has received all data
A writer can request an acknack from a reader. Reasons for
a writer to require an acknack are:
- resource management
- to fullfill the durability property
- flow control
CycloneDDS uses smart policies to decide when to request
an acknack (e.g., an adaptive policy to request more often
when writer history cache reaches threshold).
writer reader
data
acknack #2
1
2
3 2
3
Acknack example
1 2
10. Dealing with scaling
Scalability is affected by characteristics of the application, and
characteristics induced by DDS. We focus on things that you can
that reduce the scalability effects of DDS , in particular on the
quadratic scaling in discovery.
Solution ingredients: separation in time and/or space
• Controlled start up of participants and endpoints
• Delayed acknacks
• System partitioning
Separation in time and space
11. Controlled startup
Instead of starting all particpants and endpoints at the same time,
start smaller batches at different times. This spreads out the
amount of discovery data over time.
Controlled startup can be part of the startup procedure of a system
….. but it does not work disconnect/reconnects.
When a reconnect occurs, discovery takes place immediately,
which means you get the quadratic participant discovery and
endpoint discovery problem back.
Immediate startup
Data space
12. Controlled startup
Controlled startup
Data space
Instead of starting all particpants and endpoints at the same time,
start smaller batches at different times. This spreads out the
amount of discovery data over time.
Controlled startup can be part of the startup procedure of a system
….. but it does not work disconnect/reconnects.
When a reconnect occurs, discovery takes place immediately,
which means you get the quadratic participant discovery and
endpoint discovery problem back.
13. Delayed acknacks
Delaying sending acknacks may lead to spreading out discovery
over time
Default ack delay
setting (whenever)
ack delay = 7ms
Experiment setup
• 1 base station, 150 satellite stations
• Base station has 100 writers and 100 readers
• Satellite stations have 100 writers and 200 readers
• Each statellite station receives transient local data
published by base station
• Each satellite station publishes transient local data
using 100 writers (total: 20000 instances).
• Each satellite station receives data publishes by base
station and data from other satellite stations
Experiment
1 base station
150 satellites
Base station to satellite; 100 topics,
transient local data
Inter-satellite communication; 100
topics, transient local data, 20000
instances
base
satelllite
14. Delayed acknacks (continued)
Pictures shows measurements with the same experiment, where the
delay to send acknacks is 7ms (randomized and bounded).
Time (x-axis, sec) vs. CPU usage (y-axis, %) of CycloneDDS per thread
Default ack delay
setting (whenever)
Base station
ack delay = 7ms
Delaying acks and spreading them out over time
reduces load and decreased experiment duration
(from 200s to 180s)
Base station
Default ack delay (sent whenever)
30 s
50 s
Less busy
sending acknacks
Less busy to
process discovery
data
Threads
main – main thread of CycloneDDS, creates/deletes entities and
waits and checks
dq.builtins – CycloneDDS thread to process discovery data;
does the matching of readers and writers
tev – timed event thread that handles asynchronous events such
as retransmitting acknacks, sending heartbeats, and sending
discovery messages
recv – receive data and handing it off to e.g., dq.builtins
15. System partitioning
Reduce the number of matching endpoints by preventing matching
alltogether.
Mechanism to partition
• domainId – isolates domains, including all their participants and
endpoints
• Partitions - isolates readers and writers within the same domain
by creating separate shared data spaces
• IgnoredPartitions – Prevent sending data that matches the
IgnoredPartition expression to remote participants
todo
System partitioning
16. Scaling: from flat to hub-and-spoke
Use the experimental setup discussed before. Compare flat network with hub-
and-spoke
Experiment setup
• 1 base station, 150 satellite stations
• Base station has 100 writers and 100 readers
• Satellite stations have 100 writers and 200 readers
• Each statellite station receives transient local data
published by base station
• Each satellite station publishes transient local data
using 100 writers to each other satellite stations (total:
20000 instances).
• Each satellite station receives data publishes by base
station and data from other satellite stations
Apply architectural changes and partitioning techniques to prevent
quadratic scalability problems
Recipe to scale
1. Prevent inter-satellite participant discovery
2. Apply partitioning techniques to prevent inter-
satellite communication
3. Use forwarder to realize satellite-to-satellite
communication
Flat architecture Hub-and-spoke
17. Step 1: Prevent participant discovery
Prevent inter-satellite discovery
base station
Client config
<CycloneDDS>
<Domain>
<General>
<Interfaces>
<NetworkInterfaceAddress>127.0.0.1</NetworkInterfaceAddress>
</Interfaces>
<AllowMulticast>asm</AllowMulticast>
</General>
<Discovery>
<DefaultMulticastAddress>239.255.0.1</DefaultMulticastAddress>
</Discovery>
</Domain>
</CycloneDDS>
Unicast SPDP
Configure satellite stations to always send directed SPDP messages
instead of multicast SPDP messages. This prevents that satellite
stations know about each other. Consequently, this also prevents
that endpoints on satellite nodes know about each other.
satellite station satellite station
Make sure that atellite stations NEVER multicasts SPDP
messages
unicast
SPDP
multicast
SPDP
18. Step 1: Prevent participant discovery
Prevent inter-satellite discovery
base station
Configure satellite nodes to always send directed SPDP messages
instead of multicast SPDP messages. This prevents that satellite
nodes know about each other. Consequently, this also prevents that
endpoints on satellite nodes know about each other.
A writer on a satellite node now matches with only 1 reader on the
base station instead of 150 readers on all satellite stations.
satellite station satellite station
Satellite stations NEVER multicasts SPDP messages
R
W
R
W
R
W
R
W
R
no
communication
19. Step 2: Use partitions
Use satellite specific partitions
Use partitions to prevent inter satellite communication. A satellite
writer publishes data on a satellite-specific partition, and the base
station publishes on global partition. Satellites subscribe to the base
station.
This limits endpoint discovery.
Base station is required to forward data published by satellite
station to other satellites
R
W
R
W
R
W
R
W
R
participant specific “yellow”partition
participant specific “brown”partition
global“red”partition
20. Step 3: Forwarding data
Use forwarder that subscribes to satellite-specific data and
republishes the data on the global partition.
Data published by satellite stations is now forwarded to other
satellites. Because the data is republished on the global partition
and there are multiple interested recipients for the data, this data
(that goes from base stations to satellite stations) will be
multicasted.
Use satellite specific partitions
R
W
R
W
R
W
R
W
R
participant specific “yellow”partition
participant specific “brown”partition
global“red”partition
forwarder
21. Step 3: Forwarding data (continued)
The forwarder is a component that receives data on one partition,
and republishes the data in another partition.
Forwarder can be build as an application component, or using Zenoh
routers. Zenoh is a scalable technology that is specialized in bringing
data efficiently at the right location at the right time. Zenoh
integrates well with CycloneDDS.
Zenoh routing
forwarder
R
R R R
W
…..
More info on Zenoh: see https://zenoh.io/
23. Summary of the results
The following conclusions can be drawn from the experiments
1. The hub-and-spoke architecture takes less time to complete
2. Threads related to sending and processing of discovery data are
less busy in a hub-and-spoke architecture.
Experiment environment
.
Experiments conducted on a 40 core Intel ® Xeon
CPU E5-2690 v2 @ 3.00 GHz, 2 threads per core
Traces of overload situations have been seen for
large number of satellite stations in flat network
architectures. Overload situations may reduce the
rate at which threads can actually make progress
handling data, because data has to be
retransmitted more often.
24. Thank you for your attention
Background material
CycloneDDS: https://cyclonedds.io/
Cyclone configuration guide:
https://cyclonedds.io/docs/cyclonedds/latest/config/index
.html
Zenoh: https://zenoh.io/
DDS: https://www.omg.org/spec/DDS/1.4/PDF
DDSI-RTPS: https://www.omg.org/spec/DDSI-
RTPS/2.2/PDF/
Concluding remarks
Hub-and-spoke can be used to reduce scalability in situation where
endpoint discovery becomes a limiting factor.
A single hub is also a single point of failure. To keep fault tolerance
levels up, you may need redundant hubs.