What is a Load Balancer?

A load balancer sits in front of a group of servers and routes incoming client requests across them. The goal is simple: no single server bears too much traffic, so the system stays fast and available.

Without a load balancer, scaling means telling every client to hit a different server — which is impractical. With one, you expose a single IP/domain and let the load balancer decide who handles each request.

Architecture
  Client A ─┐
  Client B ─┼──► [ Load Balancer ] ──► Server 1
  Client C ─┤                     ──► Server 2
  Client D ─┘                     ──► Server 3
            

The load balancer tracks which servers are healthy and which are overloaded, then makes smart routing decisions on every request.

L4 vs L7 Load Balancing

Load balancers operate at different layers of the OSI model, and the difference matters a lot in practice.

L4 (Transport)L7 (Application)
Works onTCP/UDP packetsHTTP/HTTPS content
SeesIP + Port onlyHeaders, cookies, URL path
SpeedFaster (less inspection)Slower (more inspection)
Routing logicIP/Port basedPath, header, host based
SSL terminationNo (passes through)Yes
Use caseHigh throughput, low latencyMicroservices, API routing
Rule of thumb: Use L7 (application-layer) for most web applications — it gives you URL-based routing, sticky sessions, and SSL termination. Use L4 when you need raw throughput and latency is critical.

Routing Algorithms

This is where it gets interesting. The algorithm decides which server handles each request.

Round Robin

The simplest algorithm. Distribute requests sequentially across all servers.

Requests: 1, 2, 3, 4, 5, 6
Servers:  A, B, C

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  ← cycles back
Request 5 → Server B
Request 6 → Server C

Problem: Assumes all requests are equal weight. A heavy database query and a lightweight health-check ping both count as "one request."

Weighted Round Robin

Assign different weights based on server capacity. A server with weight 3 gets 3x the traffic of one with weight 1.

Server A: weight 3  → handles 60% of traffic
Server B: weight 1  → handles 20% of traffic
Server C: weight 1  → handles 20% of traffic

Least Connections

Route each new request to the server with the fewest active connections. Better than round robin for workloads where request duration varies wildly.

Server A: 10 active connections
Server B: 3 active connections  ← next request goes here
Server C: 7 active connections

IP Hash (Sticky Sessions)

Hash the client's IP address to always route them to the same server. This is how you implement sticky sessions without shared session storage.

hash(client_ip) % num_servers = server_index

192.168.1.1  → hash → 2 → Server C  (always)
10.0.0.5     → hash → 0 → Server A  (always)
Problem with IP Hash: If a server goes down, all users hashed to it lose their session. Also fails behind NAT (many users share one IP). Prefer token-based sessions + distributed session store (Redis) instead.

Consistent Hashing

The most sophisticated algorithm. Servers are placed on a virtual ring. Each request is hashed to a point on the ring, and routed to the nearest server clockwise. When a server is added or removed, only its neighbors are affected — not the whole cluster.

Consistent Hash Ring
               Server A (0°)
                    │
       Server C ────┼──── Request X
       (270°)       │        │
                    │      hashes to 45°
               Server B (90°)   → goes to Server B
            

Used by Cassandra, DynamoDB, and most CDN routing. Minimizes cache misses when the cluster scales.

Health Checks

A load balancer is only useful if it knows which servers are healthy. Health checks are how it finds out.

Passive Health Checks

Monitor real traffic. If a server returns 5xx errors or times out, mark it unhealthy and stop routing to it.

Active Health Checks

Proactively send pings/probes to each server at regular intervals, independent of real traffic.

# Example: Nginx active health check config
upstream backend {
    server srv1.example.com;
    server srv2.example.com;

    check interval=3000 rise=2 fall=3 timeout=1000 type=http;
    check_http_send "HEAD /health HTTP/1.0\r\n\r\n";
    check_http_expect_alive http_2xx;
}
Best practice: Always expose a dedicated /health endpoint on your servers. It should check downstream dependencies (DB connection, cache) and return 200 if healthy, 503 if not.

Active-Active vs Active-Passive

This refers to the load balancer setup itself, not the servers behind it.

Active-Passive (Failover)

Two load balancers exist — one handles all traffic (active), the other sits idle (passive). If the active one fails, the passive takes over via a floating IP.

  Traffic → [ LB Active ]  → Servers
             [ LB Passive ] (idle, monitoring active)

  On failure:
  Traffic → [ LB Passive ] → Servers  ← takes over automatically
            

Active-Active

Both load balancers handle traffic simultaneously. DNS or an upstream router splits traffic between them. Higher throughput, no idle capacity wasted.

  Traffic ─┬─► [ LB-1 ] ──► Servers
           └─► [ LB-2 ] ──► Servers
            

Real World Tools

Nginx
Open source, L7, extremely configurable. The standard choice for most teams.
HAProxy
High-performance L4/L7, preferred for TCP workloads and raw throughput.
AWS ALB
Managed L7 LB on AWS. Path-based routing, native integration with ECS/EKS.
AWS NLB
Managed L4. Ultra-low latency, static IP, good for non-HTTP protocols.
Cloudflare
Global anycast LB + DDoS protection + CDN. Sits at the DNS layer.
Envoy
L7 proxy / service mesh building block. Used by Istio, most cloud-native stacks.
Key Takeaways
  • Load balancers sit in front of servers and distribute traffic to prevent any single server from becoming a bottleneck.
  • L4 is faster but dumb (IP/port only). L7 is smarter but slower (reads HTTP headers, URLs).
  • Round Robin is simple but ignores server load. Use Least Connections for variable workloads.
  • Consistent Hashing minimizes disruption when servers join or leave the cluster — critical for caching.
  • Always expose a /health endpoint so the load balancer can route away from unhealthy instances fast.
  • Active-Active gives more throughput. Active-Passive gives simpler failover. Most production systems use Active-Active.