Scalability and Load BalancingLesson 2.2

How load balancers work and which algorithm to use

round robin, least connections, IP hash, layer 4 vs layer 7 load balancing, health checks, sticky sessions

What a Load Balancer Does

A load balancer sits between clients and your servers, distributing incoming requests to prevent any single server from becoming a bottleneck. It also detects unhealthy servers and stops routing traffic to them.

Load Balancing Algorithms

Round Robin: requests go to servers in rotation (1→2→3→1→2→3). Simple, works when servers have equal capacity.
Least Connections: next request goes to the server with fewest active connections. Better when requests have variable processing time.
IP Hash: client IP deterministically maps to a server. Useful for sticky sessions, but fails if a server goes down.
Weighted Round Robin: heavier servers get more traffic. Use when server capacities differ.

Layer 4 vs Layer 7

L4 (TCP): routes based on IP and port. Fast, no content inspection.
L7 (HTTP): routes based on URL, headers, or cookies. Enables path-based routing (e.g., /api → service A, /static → CDN).

Health Checks

Load balancers ping servers at a set interval. A server that fails N consecutive checks is removed from rotation. Always configure health checks with a path that actually exercises your app, not just a TCP ping.