Info & Theory
A load balancer spreads incoming requests across a pool of backend servers so none is overwhelmed while others idle. The policy decides which server gets each request.
The policies
- Round-robin — cycle through servers in order.
- Least-connections — send to the server with the fewest active requests.
- Weighted — give faster servers proportionally more traffic.
- Random — pick a server uniformly at random.
Poisson arrivals
Requests arrive randomly: inter-arrival gaps are
−ln(U)/λ for random U, the classic
Poisson process seen in real bursty traffic.
Utilisation and latency
Utilisation ρ = λ/μ is the busy fraction of a
server. Queueing theory gives waiting time
∝ 1/(1 − ρ), so latency explodes as
ρ → 1. Round-robin can still pile queues onto slow
servers; least-connections adapts to keep them even.
Frequently asked questions
What does a load balancer do?
A load balancer sits in front of a pool of backend servers and decides which server should handle each incoming request, spreading traffic so no single server is overwhelmed while others sit idle.
How does round-robin balancing work?
Round-robin sends each successive request to the next server in a fixed cyclic order: server 1, server 2, server 3, then back to 1. It is simple and fair when servers are identical and requests cost roughly the same.
What is least-connections balancing?
Least-connections sends each new request to the server currently handling the fewest active requests. It adapts to uneven request durations and unequal server speeds far better than round-robin.
What is weighted load balancing?
Weighted balancing assigns each server a weight proportional to its capacity, so faster servers receive proportionally more requests. A server with weight 3 gets roughly three times the share of one with weight 1.
When is random load balancing acceptable?
Picking a server uniformly at random is stateless and trivial to implement, and with many servers it approaches even distribution. The "power of two random choices" variant — pick two, take the less loaded — performs remarkably well.
What are Poisson arrivals?
Poisson arrivals model independent random request times where the gaps between arrivals follow an exponential distribution. This simulation generates them with inter-arrival time −ln(U)/λ for a random U, matching real bursty traffic.
Why does latency rise sharply near full utilisation?
Queueing theory shows waiting time grows like 1/(1 − ρ), where ρ is utilisation. As a server approaches 100% busy, even a small traffic increase causes queues and latency to explode.
What is server utilisation?
Utilisation ρ is the fraction of time a server is busy, equal to arrival rate divided by service rate for that server. Above ρ = 1 the server cannot keep up and its queue grows without bound.
Why can round-robin still cause imbalance?
Round-robin ignores how long each request takes and how fast each server is. If requests vary in cost or servers differ in speed, some servers build up queues while others drain, even though counts are equal.
How is average latency measured here?
The simulation records each completed request's total time — waiting in the queue plus being served — and reports the running average across all finished requests under the current policy.