2. Starting
❖ About Me : Engineering @ Paytm . Working on this
problem for 2 months
❖ Problem : Identifying Entry Solution for 80K TPS, 20K
active transacting connections , while keeping latency loss
< 2 ms
❖ Outline : Evaluation and Perf test of all sorts of LB,
Routers and classify them
❖ Not Covering : After Every solution, things which are not
covered
3. Evaluation criteria
▸High Availability ( HA ) : Unaffected service during any
predefined number of simultaneous failures
▸Balancing strategies : Round robin, least connection,
weighted .
▸Health Checks
▸Extensibility : C/Lua Lib support
▸Monitoring and Manageability
▸Perf
4. Categories of LB
❖ DNS Based
❖ Software & Hardware Based
❖ Layer 3/4 Proxying
❖ Layer 7 Proxying
❖ Routing at L4
cue 5
5. DNS Based
❖ Multiple IPs : Round
Robin
❖ No Concept of HA,
Monitoring, health
checks
❖ Health Checks, Routing
policies are available via
custom solutions
6. Layer 3/4 Load Balancing
❖ Hardware Based LBs mostly.
❖ No well known Prog. which runs in Kernel Space.
❖ Software Based User Space Proxy based LBs examples
are Haproxy and Nginx
9. Issues with Haproxy L4
❖ Scale Constraint
❖ Only CPU. Cores 100% with Load(1 min) as 64
❖ Benchmark
❖ 20K TPS , keep-alive off and 100ms backend latency.
10. Layer 7 load balancing
❖ Hardware based Lb : F5, Fortinet.
❖ Protocol rigidness
❖ No well known Prog. which runs in Kernel Space.
❖ Software Based : Nginx and HaProxy are popular ones.
❖ Benchmarking Issues with Nginx as L7
❖ Even more CPU Constraint than L4 : 18-20K TPS in
same Env
11. Not covering these for
Haproxy
❖ Security Aspects : IPTables, WAF, Selinux
❖ Bare Metal Machines Detailed Specs and Part Numbers
❖ Decision on choice of Machine.
❖ Networking Details
❖ NIC Bonding Specs
❖ Benchmark Tools Detailing : GOR Detailing
cue 15
12. Routing L3/4
❖ What is routing
❖ Routing scales , less than half resources are required than
proxying.
13. Types of routing
❖ Natting : Works like proxy
❖ Direct Route : Spoof MAC address and send it back.
❖ IP Tunneling : Most Scalable, works on IPIP Tunnel ( across different DCs
)
14. Routers
❖ Hardware routers : Not designed to be horizontally
scalable
❖ No Well-Known Horizontally scalable Hw Routers.
❖ We needed a Software Router : LVS/IPVS
cue 20
15. Software Router : LVS
❖ LVS : Linux Virtual server , 20 years old,
both Layer 4 and 7
❖ IPVS : IP Virtual Server, merged in
Kernel 2.4
❖ KTCPVS : App LB , in dev for last 8
years.
❖ Runs in Kernel Space
❖ Supports different distribution methods : RR,
Least connection, Weighted LC
16. LVS Issues
❖ CPU Affinity of Interrupts
❖ RP Filter Bypass
❖ Manageability and Monitoring
❖ HA
❖ IP Tunnel Extensibility
17. LVS : CPU Affinity
❖ CPU Affinity of Interrupts
❖ Kernel tries to load balance IRQ ( Interrupt Request Line ) across
cores.
❖ irqbalance service is responsible.
❖ cat /proc/interrupts will help see which core will max out.
❖ Balance (1) : echo fff > /sys/class/net/eth0/queues/rx-0/rps_cpus
❖ Balance (2) : echo 'fff' > /proc/irq/14/smp_affinity
❖ Balance (3) : echo '0-3' > /proc/irq/28/smp_affinity_list
18. LVS : RP Filter
❖ RP Filter : To Avoid Spoofing and DDOS
❖ Kernel checks whether the source of the received
packet is reachable through the route it came in.
❖ To Disable : net.ipv4.conf.tun.rp_filter = 0 in
/etc/sysctl.conf ( and sysctl -p )
19. LVS :Monitoring &
management❖ Managed by System Calls , No config ( use Consul Template )
❖ Logging : No Logs in user Space, Kernel messages for Errors
❖ Monitoring : Telegraf plugin available ( internals : ipvsadm —list —numeric /—connection /—
stats /—rate )
20. LVS : HA
❖ KeepAlive(d)
+ VIP
❖ Connection
Sync Service
❖ ipvsadm —start/stop-
daemon=master/backu
p --mcast-interface=<> -
-syncid <>
21. ❖ KeepAlive(d) for own Health Check
❖ Consul Template for Real Server Healtch Check
LVS : HealthCheck
cue 30
22. LVS IPIP Debugging
❖ IPIP Tunnel and VIP extension to multiple machines :Painful
❖ IPIP Tunnel Issues and recovery across DC
❖ Setup Probes and Packet Capture
25. Willy Tarreau : Haproxy
❖ Creator of Haproxy
❖ wtarreau.blogspot.com/2006/11/making-applications-
scalable-with-load.html
❖ The PPT structure is based on the article.