What is a Load Balancer?
A load balancer sits in front of a group of servers and routes incoming client requests across them. The goal is simple: no single server bears too much traffic, so the system stays fast and available.
Without a load balancer, scaling means telling every client to hit a different server — which is impractical. With one, you expose a single IP/domain and let the load balancer decide who handles each request.
Client A ─┐
Client B ─┼──► [ Load Balancer ] ──► Server 1
Client C ─┤ ──► Server 2
Client D ─┘ ──► Server 3
The load balancer tracks which servers are healthy and which are overloaded, then makes smart routing decisions on every request.
L4 vs L7 Load Balancing
Load balancers operate at different layers of the OSI model, and the difference matters a lot in practice.
Routing Algorithms
This is where it gets interesting. The algorithm decides which server handles each request.
Round Robin
The simplest algorithm. Distribute requests sequentially across all servers.
Requests: 1, 2, 3, 4, 5, 6
Servers: A, B, C
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A ← cycles back
Request 5 → Server B
Request 6 → Server C
Problem: Assumes all requests are equal weight. A heavy database query and a lightweight health-check ping both count as "one request."
Weighted Round Robin
Assign different weights based on server capacity. A server with weight 3 gets 3x the traffic of one with weight 1.
Server A: weight 3 → handles 60% of traffic
Server B: weight 1 → handles 20% of traffic
Server C: weight 1 → handles 20% of traffic
Least Connections
Route each new request to the server with the fewest active connections. Better than round robin for workloads where request duration varies wildly.
Server A: 10 active connections
Server B: 3 active connections ← next request goes here
Server C: 7 active connections
IP Hash (Sticky Sessions)
Hash the client's IP address to always route them to the same server. This is how you implement sticky sessions without shared session storage.
hash(client_ip) % num_servers = server_index
192.168.1.1 → hash → 2 → Server C (always)
10.0.0.5 → hash → 0 → Server A (always)
Consistent Hashing
The most sophisticated algorithm. Servers are placed on a virtual ring. Each request is hashed to a point on the ring, and routed to the nearest server clockwise. When a server is added or removed, only its neighbors are affected — not the whole cluster.
Server A (0°)
│
Server C ────┼──── Request X
(270°) │ │
│ hashes to 45°
Server B (90°) → goes to Server B
Used by Cassandra, DynamoDB, and most CDN routing. Minimizes cache misses when the cluster scales.
Health Checks
A load balancer is only useful if it knows which servers are healthy. Health checks are how it finds out.
Passive Health Checks
Monitor real traffic. If a server returns 5xx errors or times out, mark it unhealthy and stop routing to it.
Active Health Checks
Proactively send pings/probes to each server at regular intervals, independent of real traffic.
# Example: Nginx active health check config
upstream backend {
server srv1.example.com;
server srv2.example.com;
check interval=3000 rise=2 fall=3 timeout=1000 type=http;
check_http_send "HEAD /health HTTP/1.0\r\n\r\n";
check_http_expect_alive http_2xx;
}
/health endpoint on your servers. It should check downstream dependencies (DB connection, cache) and return 200 if healthy, 503 if not.
Active-Active vs Active-Passive
This refers to the load balancer setup itself, not the servers behind it.
Active-Passive (Failover)
Two load balancers exist — one handles all traffic (active), the other sits idle (passive). If the active one fails, the passive takes over via a floating IP.
Traffic → [ LB Active ] → Servers
[ LB Passive ] (idle, monitoring active)
On failure:
Traffic → [ LB Passive ] → Servers ← takes over automatically
Active-Active
Both load balancers handle traffic simultaneously. DNS or an upstream router splits traffic between them. Higher throughput, no idle capacity wasted.
Traffic ─┬─► [ LB-1 ] ──► Servers
└─► [ LB-2 ] ──► Servers
Real World Tools
- Load balancers sit in front of servers and distribute traffic to prevent any single server from becoming a bottleneck.
- L4 is faster but dumb (IP/port only). L7 is smarter but slower (reads HTTP headers, URLs).
- Round Robin is simple but ignores server load. Use Least Connections for variable workloads.
- Consistent Hashing minimizes disruption when servers join or leave the cluster — critical for caching.
- Always expose a
/healthendpoint so the load balancer can route away from unhealthy instances fast. - Active-Active gives more throughput. Active-Passive gives simpler failover. Most production systems use Active-Active.