Collection grid (filled = accent, missing = muted) and coverage
curve flattening
Histogram of completion times over runs vs expected value E[T]
📖 Theory — harmonic numbers
Suppose every draw gives one of
N coupon types with
equal probability. Once you already hold k distinct
coupons, the chance a new draw is one you are missing is
(N − k) / N. The number of draws to get the next new
coupon is therefore geometric with mean
N / (N − k). Summing over k = 0 … N − 1
gives the expected total time
E[T] = N·(1/N + 1/(N−1) + … + 1/1) = N·H_N, where
H_N is the N-th harmonic number.
📐 The last coupon dominates ~ N·ln N
Because
H_N ≈ ln N + γ (with the Euler–Mascheroni
constant γ ≈ 0.5772), the expectation is
E[T] ≈ N·ln N + γN. The biggest single contribution is
the very last coupon: when you are missing only one type, each draw
hits it with probability 1/N, so that final coupon
alone costs about N draws on average. This is the
"diminishing returns" effect — the coverage curve flattens out and
the long tail of completion times comes almost entirely from waiting
on the last one or two coupons.
🃏 Applications
The same maths governs many real situations: completing a set of
trading cards or sticker albums from random packs, opening loot
boxes to obtain every item, and estimating how many random test
inputs are needed to exercise every code path (test coverage).
Anywhere you draw repeatedly at random and want to collect every
distinct outcome, the expected effort grows like
N·ln N — much faster than the naive guess of
N draws.