How Load Balancers Work: Algorithms & Patterns Explained

What is a Load Balancer?

A load balancer sits in front of a group of servers and routes incoming client requests across them. The goal is simple: no single server bears too much traffic, so the system stays fast and available.

Without a load balancer, scaling means telling every client to hit a different server — which is impractical. With one, you expose a single IP/domain and let the load balancer decide who handles each request.

Architecture

  Client A ─┐
  Client B ─┼──► [ Load Balancer ] ──► Server 1
  Client C ─┤                     ──► Server 2
  Client D ─┘                     ──► Server 3

The load balancer tracks which servers are healthy and which are overloaded, then makes smart routing decisions on every request.

L4 vs L7 Load Balancing

Load balancers operate at different layers of the OSI model, and the difference matters a lot in practice.

L4 (Transport)L7 (Application)

Works onTCP/UDP packetsHTTP/HTTPS content

SeesIP + Port onlyHeaders, cookies, URL path

SpeedFaster (less inspection)Slower (more inspection)

Routing logicIP/Port basedPath, header, host based

SSL terminationNo (passes through)Yes

Use caseHigh throughput, low latencyMicroservices, API routing

Rule of thumb: Use L7 (application-layer) for most web applications — it gives you URL-based routing, sticky sessions, and SSL termination. Use L4 when you need raw throughput and latency is critical.

Routing Algorithms

This is where it gets interesting. The algorithm decides which server handles each request.

Round Robin

The simplest algorithm. Distribute requests sequentially across all servers.

Requests: 1, 2, 3, 4, 5, 6
Servers:  A, B, C

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  ← cycles back
Request 5 → Server B
Request 6 → Server C

Problem: Assumes all requests are equal weight. A heavy database query and a lightweight health-check ping both count as "one request."

Weighted Round Robin

Assign different weights based on server capacity. A server with weight 3 gets 3x the traffic of one with weight 1.

Server A: weight 3  → handles 60% of traffic
Server B: weight 1  → handles 20% of traffic
Server C: weight 1  → handles 20% of traffic

Least Connections

Route each new request to the server with the fewest active connections. Better than round robin for workloads where request duration varies wildly.

Server A: 10 active connections
Server B: 3 active connections  ← next request goes here
Server C: 7 active connections

IP Hash (Sticky Sessions)

Hash the client's IP address to always route them to the same server. This is how you implement sticky sessions without shared session storage.

hash(client_ip) % num_servers = server_index

192.168.1.1  → hash → 2 → Server C  (always)
10.0.0.5     → hash → 0 → Server A  (always)

Problem with IP Hash: If a server goes down, all users hashed to it lose their session. Also fails behind NAT (many users share one IP). Prefer token-based sessions + distributed session store (Redis) instead.

Consistent Hashing

The most sophisticated algorithm. Servers are placed on a virtual ring. Each request is hashed to a point on the ring, and routed to the nearest server clockwise. When a server is added or removed, only its neighbors are affected — not the whole cluster.

Consistent Hash Ring

               Server A (0°)
                    │
       Server C ────┼──── Request X
       (270°)       │        │
                    │      hashes to 45°
               Server B (90°)   → goes to Server B

Used by Cassandra, DynamoDB, and most CDN routing. Minimizes cache misses when the cluster scales.

Health Checks

A load balancer is only useful if it knows which servers are healthy. Health checks are how it finds out.

Passive Health Checks

Monitor real traffic. If a server returns 5xx errors or times out, mark it unhealthy and stop routing to it.

Active Health Checks

Proactively send pings/probes to each server at regular intervals, independent of real traffic.

# Example: Nginx active health check config
upstream backend {
    server srv1.example.com;
    server srv2.example.com;

    check interval=3000 rise=2 fall=3 timeout=1000 type=http;
    check_http_send "HEAD /health HTTP/1.0\r\n\r\n";
    check_http_expect_alive http_2xx;
}

Best practice: Always expose a dedicated /health endpoint on your servers. It should check downstream dependencies (DB connection, cache) and return 200 if healthy, 503 if not.

Active-Active vs Active-Passive

This refers to the load balancer setup itself, not the servers behind it.

Active-Passive (Failover)

Two load balancers exist — one handles all traffic (active), the other sits idle (passive). If the active one fails, the passive takes over via a floating IP.

  Traffic → [ LB Active ]  → Servers
             [ LB Passive ] (idle, monitoring active)

  On failure:
  Traffic → [ LB Passive ] → Servers  ← takes over automatically

Active-Active

Both load balancers handle traffic simultaneously. DNS or an upstream router splits traffic between them. Higher throughput, no idle capacity wasted.

  Traffic ─┬─► [ LB-1 ] ──► Servers
           └─► [ LB-2 ] ──► Servers

Real World Tools

Nginx

Open source, L7, extremely configurable. The standard choice for most teams.

HAProxy

High-performance L4/L7, preferred for TCP workloads and raw throughput.

AWS ALB

Managed L7 LB on AWS. Path-based routing, native integration with ECS/EKS.

AWS NLB

Managed L4. Ultra-low latency, static IP, good for non-HTTP protocols.

Cloudflare

Global anycast LB + DDoS protection + CDN. Sits at the DNS layer.

Envoy

L7 proxy / service mesh building block. Used by Istio, most cloud-native stacks.

Key Takeaways

Load balancers sit in front of servers and distribute traffic to prevent any single server from becoming a bottleneck.
L4 is faster but dumb (IP/port only). L7 is smarter but slower (reads HTTP headers, URLs).
Round Robin is simple but ignores server load. Use Least Connections for variable workloads.
Consistent Hashing minimizes disruption when servers join or leave the cluster — critical for caching.
Always expose a /health endpoint so the load balancer can route away from unhealthy instances fast.
Active-Active gives more throughput. Active-Passive gives simpler failover. Most production systems use Active-Active.

What is a Load Balancer?

L4 vs L7 Load Balancing

Routing Algorithms

Round Robin

Weighted Round Robin

Least Connections

IP Hash (Sticky Sessions)

Consistent Hashing

Health Checks

Passive Health Checks

Active Health Checks

Active-Active vs Active-Passive

Active-Passive (Failover)

Active-Active

Real World Tools

Related Topics