🎟️ Coupon Collector

Number of coupon types N: 20 Animation speed: 5

Collected0 / 20

Draws so far0

Last drawn—

E[T] = N·H_N—

N·ln N + γN—

Trials done0

Empirical mean—

Coupon Collector
E[T] = N·H_N = N·Σ(1/k), k=1..N
≈ N·ln N + γN, with γ ≈ 0.5772
The last few coupons dominate: the final one alone needs ≈ N draws on average.

Collection grid (filled = accent, missing = muted) and coverage curve flattening

Histogram of completion times over runs vs expected value E[T]

📖 Theory — harmonic numbers

Suppose every draw gives one of N coupon types with equal probability. Once you already hold k distinct coupons, the chance a new draw is one you are missing is (N − k) / N. The number of draws to get the next new coupon is therefore geometric with mean N / (N − k). Summing over k = 0 … N − 1 gives the expected total time E[T] = N·(1/N + 1/(N−1) + … + 1/1) = N·H_N, where H_N is the N-th harmonic number.

📐 The last coupon dominates ~ N·ln N

Because H_N ≈ ln N + γ (with the Euler–Mascheroni constant γ ≈ 0.5772), the expectation is E[T] ≈ N·ln N + γN. The biggest single contribution is the very last coupon: when you are missing only one type, each draw hits it with probability 1/N, so that final coupon alone costs about N draws on average. This is the "diminishing returns" effect — the coverage curve flattens out and the long tail of completion times comes almost entirely from waiting on the last one or two coupons.

🃏 Applications

The same maths governs many real situations: completing a set of trading cards or sticker albums from random packs, opening loot boxes to obtain every item, and estimating how many random test inputs are needed to exercise every code path (test coverage). Anywhere you draw repeatedly at random and want to collect every distinct outcome, the expected effort grows like N·ln N — much faster than the naive guess of N draws.