This event took place on 27 October 2021.
In this Tech 2 Tech session, we considered questions such as:
- Which types of applications need low latency, and what are their specific requirements for both latency and jitter?
- What levels of latency might you expect across Janet?
- What can you do to optimise latency for your networked applications?
- How can we measure latency and jitter?
3. Overview
Today’s session
•Network performance is typically focused on achieving good throughput
for large scale data transfers
•See our May T2T event - https://www.jisc.ac.uk/events/tech-2-tech-
network-performance-24-may-2021
•We’ve noticed we’re getting an increasing number of queries about low
latency networking
•In this session we aim to identify use cases that matter to our members,
how these might be delivered, and what measurement tools we’d like to
have available
3
4. Use cases for low latency networking
Application areas?
•Distributed performing arts
•Haptic / remote control applications via IP
•Distributed storage / databases
•Conferencing tools, voice over IP (VoIP)
•Virtual reality (VR) headsets
•Transnational education (TNE)
•Gaming
•Q: Do we know specific latency requirements?
4
5. Distributed performing arts
Multi-site performances over the Internet
•Orchestras
•Musicians at different locations
•Remote conductor
•Theatre, plays
•Actors in multiple locations
•Example application - LoLa
•See https://lola.conts.it/ - a GARR project
•Example https://www.youtube.com/watch?v=LK2WNyfLGlc
•OWD needs to be below “threshold of perception for temporal segregation” – 30ms
5
6. Haptic / remote control applications
Using the network for remote control
•Controlling devices from afar
•Might be gloves locally, robot arm remotely
•Haptics implies (force) feedback
•Various application areas, including
•Medical
•Teaching / learning
•Joystick control of remote device
•Shaving?
•EE TV ad: https://www.youtube.com/watch?v=gWiV3DF5JkU
6
7. Distributed storage / databases
Latency requirements may be strict
•Use case might be a resilient database configuration or some
form of distributed file system or cluster
•May become an issue when multi-site
•e.g., local campus and remote data centre
•Seeing more questions from members in this area
•A recent example:
•Dell EMC VxRail 7.0 vSAN stretched cluster
•Requires RTT between sites hosting VM objects < 5ms
•Is that achievable reliably between X and Y?
7
8. Other use cases
Include…
•Conference tools – Zoom, Teams, … 3rd party servers
•VoIP – Probably widely deployed on campuses by now
•TNE – improving the experience for remote learners
•Virtual reality (VR) – between device and compute
•Gaming – campuses have students!
•These have a wide range of requirements
8
9. Any use cases we missed of interest to you?
Or any questions?
10. Latency expectations
What can you expect?
•Latency largely determined by distance, and speed of light in fibre
•But the fibre path won’t be as the crow flies
•Latency will be the result of the sum of all elements on the path,
including end systems and devices, access network, network elements,
and the distance involved
•Ball park OWD between site border routers?
•Between nearby sites on Janet: ~1ms
•Between distant sites on Janet: 6-8 ms
•Between Janet and the US east coast: ~35ms
10
11. Latency on / across Janet
What should I expect?
•The Janet network is being refreshed
•Backbone network with core PoPs and IX presence
•Much of the focus is on capacity, use of 400G, n x 100G
•Regional networks are being updated through the access programme – join one
of our T2T access programme update sessions to learn more
•There is no latency SLA on Janet
•Though there is also no throughput SLA – but we give advice and guidance
•Janet Netpath+ circuits provisioned directly on the transmission layer should
have fixed latency
11
12. Access network technology
How does this affect latency?
•Janet member sites are generally connected to their access
router via local Ethernet networks
•Minimal latency
•Other access network technologies will have higher latency
•Residential broadband
•4G/5G mobile networks
•Satellite / LEO (e.g., Starlink)
•Users used to typical home network latency can be pleasantly
surprised by what is possible across Janet
12
14. Minimising network latency
Approaches
•Optimising equipment, end to end
•e.g. dedicated LoLa hardware – PC, camera, codec, displays
•Using Science DMZ principles
•The friction-free networking principle
•Ensuring optimal routing
•Not all paths are optimal for latency
14
15. Example: LoLa
Every millisecond counts
•Good discussion in the LoLa 2.0 manual - see https://lola.conts.it/
•Hardware
•Very specific requirements on the PC hardware
•Especially video input/output, audio input/output, capture & display
•Network
•1Gbps+ ethernet (compression saves bandwidth, adds latency)
•Switch hardware; must handle 1K packets at high pps rate
•Avoid using campus firewall, avoid NAT
15
16. Using Science DMZ principles
General principles
•Treat science/research and business traffic differently
•But here its latency sensitive applications that need to be treated differently
•Elements:
•Friction-free network path
•Optimise your local network architecture
•Efficient application of security policy (avoid main campus firewall)
•But instead of well-tuned data transfer nodes (DTNs) for low latency
applications we need optimized hardware as per the LoLa example
•Persistent performance monitoring is still important, e.g., perfSONAR
•With strong user engagement – know who your low latency users are
16
17. Example classic Science DMZ architecture
10GE
10GE
10GE
10GE
10G
Border
Router
WAN
Science
DMZ
Switch/Router
Enterprise
Border
Router/Firewall
Site
/
Campus
LAN
High
performance
Data
Transfer
Node
with
high-speed
storage
Per-service
security
policy
control
points
Clean,
High-bandwidth
WAN
path
Site
/
Campus
access
to
Science
DMZ
resources
perfSONAR
perfSONAR
perfSONAR
Source: https://fasterdata.es.net
17
18. Optimising routing
Taking the fastest path not necessarily the fattest
•Routing metrics may tend to favour higher capacity paths
•Latency depends on the path between endpoints, and thus between the
networks that serve them
•Interconnects likely to be at major IXs
•R&E networks have their own interconnects, e.g., for us via GÉANT
•Many large content / cloud providers have their own global networks
•CDNs may provide a ‘nearer’ instance of a service
•Services may be pushed to the edge – a feature of 5G
•This aims to minimise latency from source to compute
18
20. Measuring latency (and jitter)
A wide range of options
•Jisc tools available to members
•Netsight3
•User tools, for example:
•Command line tools
•RIPE Atlas – community measurements
•Looking glasses – views to you from remote networks
•perfSONAR – persistent measurements over time
•In-application tools
•LoLa has a standalone test tool
•Some applications using RTP may report via RTCP (see RFC 6843)
20
22. Command line tools
Simpler tools
•ping
•traceroute
•mtr
•…
•Quick way to get a feel, but typically limited as only a small
snapshot, using protocols that might be treated differently by
the network to your application traffic
22
23. RIPE Atlas anchor
Worldwide network measurement system
•See https://atlas.ripe.net/
•Supports measurements from RIPE Atlas nodes
•Hardware (available from RIPE) or software probes
•The RIPE Atlas ecosystem is mature
•Over 11,000 probes around the world
•Jisc has an anchor node deployed at Slough
•See https://atlas.ripe.net/probes/6695/
•Useful for loss and latency, but can also do more bespoke tests
23
24. RIPE Atlas latency world map
A recently published tool using RIPE Atlas data
•Shows minimum latency seen into
a given Autonomous System
Number (network) for a given day
•Janet is ASN786
•Useful for expectations
•Note it shows RTT values
24
26. Janet looking glass
Provides views to your site
•Under redevelopment, but accessible
•See https://alice.ja.net/
•Various functions provided:
•ping (RTT)
•traceroute
•BGP route/community/path
•Can be run from a range of Janet devices
•Feedback welcomed
26
27. Persistent measurement over time: perfSONAR
• Free, open source - https://www.perfsonar.net
• Easy to download and install on CentOS7 (and Debian)
• Very useful to have persistent testing: collect history of network
characteristics – throughput, loss, latency, path
• Test against our perfSONAR node in the Jisc Slough data centre
• Throughput (up to 10G) - use ps-slough-10g.ja.net
• Latency – use ps-slough-1g.ja.net
• We are are testing 1Gbps small nodes (including RPi) and Docker versions
• Happy to work with sites to test these
27
28. perfSONAR example – UK GridPP mesh
https://psmad.opensciencegrid.org/maddash-webui/index.cgi?dashboard=UK%20Mesh%20Config
Durham – Oxford, last 12 months
28
29. TimeMap
Per-segment latency and jitter measurements
•Developed in the GÉANT GN4-3 project
•Uses TWAMP / RPM measurements
•Running on GÉANT backbone (Juniper)
•Moving towards production
•https://timemap.geant.org/
•Segment by segment
•Not an end to end view
29
30. Improved OWD measurement accuracy?
Achieving more accurate OWD measurements
•When running OWD measurements accurate time can be important
•Is NTP enough?
•Typically see 1-2ms variance – see the perfSONAR example
•Maybe be partly time synchronization, partly measurement handling
•Is there interest in a more accurate time service?
•There is the Precision Time Protocol (PTP) – IEEE 1588
•One advantage is that PTP is hardware-based
•See the perfSONAR team’s statement – cost is an issue
•Might be something to discuss with NPL
30
32. Open questions / discussion
Some closing questions…
•Have we covered low latency networking use cases of interest?
•What would you like from Jisc to help you with these?
•Do you have the information needed and capability to optimize
latency within your site where needed?
•Do you have the tools to measure latency and jitter?
•Anything else we missed?
32