1. FFMUC: Half a year with WireGuard
VXLAN + B.A.T.M.A.N. and some python included
FFWCW 2021
2. awlnx
● Annika Wickert
● Senior Network Engineer / OpenSource since 2010
● Twitter @awlnx / Github @awlx
2
Who am I?
3. 3
FFMUC?
• Freie Netze München e.V. since 2014
• Community Freifunk München since 2004
• Wifi
• #FFMEET
• DoH/DoT/DNSCrypt/DNS
• Streaming
4. 4
FFMUC ran on fastd
• FFMUC was built with fastd and B.A.T.M.A.N.
• We got bigger compute nodes and bigger uplinks - we wanted to leverage the
resources
• We didn’t want to change too much at once => not too much risk
• So why not change _only_ the transport network and keep B.A.T.M.A.N.
5. 5
Wireguard vs fastd
• Fastd is a single threaded userspace process
• WireGuard runs in kernel space thus has to be multi threaded
• WireGuard cannot transport Layer 2 protocols - B.A.T.M.A.N. is one ...
• We need another encapsulation which solves this problem => VXLAN
Wireguard
VXLAN
B.A.T.M.A.N.
7. 7
Challenges we already knew
• No systemd-networkd support for B.A.T.M.A.N.
• We are an open network - we don’t want node owners to signup
• WireGuard has a pre-shared key infra
=> we need a daemon which handles incoming keys and programs them
to the gateways
9. 9
How does it work?
• WireGuard peers on the gateways are created by wgkex
• Allowed IP is derived from the public key of the node
• VxLAN Forwarding database entries are created by wgkex
10. 10
Get in touch with maintainers
• To get validation data correct for wgkex etc
• We contacted WireGuard maintainers early in the process
• Asked questions about known scaling issues
• Opened PRs early as drafts to see if there is a chance of merging
• systemd-networkd https://github.com/systemd/systemd/pull/17252
• gluon-community-packages
https://github.com/freifunk-gluon/community-packages/pull/6
11. 11
Solve problems upstream!
• We invested much time in systemd-networkd
• We wanted to get our stuff merged in upstream
• No custom solutions for our setup, just upstream compatible which solves many
resource problems in the future
12. 12
Gateways
• Everything is automated with Saltstack
• systemd-networkd takes care of all interfaces
• 800 - 1000 Nodes per gateway are easy
• We are able to run whole FFMUC on just two gateways
13. 13
Debugging … Flamegraphs and Bugs
• WireGuard performs well but we have too much load on our gateways. Why?
15. 15
Keep your NTP sync!
• Sync NTP before you try to connect to WireGuard
• If you don’t do that many funky things happen
• OpenWRT defaults its clock to build date of firmware so it works the first few
days after release … because it’s good enough
16. 16
Not enough random during boot
• ERX didn’t have a good enough random seed …
• After flashing, it’s unreachable for … hours … days … maybe weeks?
=> fixed
https://github.com/oszilloskop/UBNT_ERX_Gluon_Factory-Image/issues/
3
18. 18
Lessons learned
• Commit as much stuff as possible upstream
• Work close with upstream
• Get much feedback from all the communities/other people
• Involve as many people as you can
• Start your project anyway ;)
19. 19
What’s next?
• We want to get rid of B.A.T.M.A.N. for gateway uplinks (make broadcast
domains small)
■ Should boost performance by 5x to 7x depending on CPU
■ Maybe VxLAN first, then a fully routed approach
■ https://github.com/freifunkMUC/site-ffm/issues/87
20. 20
Community
• Freifunk Darmstadt and Freifunk Regensburg helped a lot during development
of wgkex!
• B.A.T.M.A.N. developers helped a lot during debugging the performance issue
and created many bugfixes
• Everything is opensource and available on Github
https://github.com/freifunkMUC
• More background and all fixes:
https://ffmuc.net/freifunkmuc/2020/12/03/wireguard-firmware/
21. 21
Thanks to everyone involved
• Freifunk Darmstadt @hexa
• Freifunk Regensburg @MoepMan
• Freifunk Hannover @aiyion, @Codefetch
• systemd Yu Watanabe, Lennart Poettering
• WireGuard Jason A. Donenfeld
• B.A.T.M.A.N. @ecsv @T_X
• All the folks of FFMUC for testing
• Everyone else who we forgot and was involved in any way
=> Community rocks! #Together #OpenSource