Collection grid (filled = accent, missing = muted) and coverage curve flattening
Histogram of completion times over runs vs expected value E[T]
📖 Theory — harmonic numbers
Suppose every draw gives one of N coupon types with equal probability. Once you already hold k distinct coupons, the chance a new draw is one you are missing is (N − k) / N. The number of draws to get the next new coupon is therefore geometric with mean N / (N − k). Summing over k = 0 … N − 1 gives the expected total time E[T] = N·(1/N + 1/(N−1) + … + 1/1) = N·H_N, where H_N is the N-th harmonic number.
📐 The last coupon dominates ~ N·ln N
Because H_N ≈ ln N + γ (with the Euler–Mascheroni constant γ ≈ 0.5772), the expectation is E[T] ≈ N·ln N + γN. The biggest single contribution is the very last coupon: when you are missing only one type, each draw hits it with probability 1/N, so that final coupon alone costs about N draws on average. This is the "diminishing returns" effect — the coverage curve flattens out and the long tail of completion times comes almost entirely from waiting on the last one or two coupons.
🃏 Applications
The same maths governs many real situations: completing a set of trading cards or sticker albums from random packs, opening loot boxes to obtain every item, and estimating how many random test inputs are needed to exercise every code path (test coverage). Anywhere you draw repeatedly at random and want to collect every distinct outcome, the expected effort grows like N·ln N — much faster than the naive guess of N draws.