🪣 Reservoir Sampling
Fair samples from a stream
Item 0 / 30
Expected inclusion: k/N
Setup
Controls
Stats
Target k/N
0.167
Runs
0
Max dev.
Status
Ready
Info & Theory

Reservoir sampling picks k items uniformly at random from a stream whose length N you do not know in advance, using only one pass and O(k) memory.

Algorithm R

  • Put the first k items into the reservoir.
  • For each later item i > k, pick a random integer j in 1…i. If j ≤ k, replace slot j with item i; otherwise discard it.

So item i enters the reservoir with probability exactly k/i.

Why it is uniform

Item i survives to the end if it is chosen on step i (prob k/i) and is then never evicted. The probability of not being evicted on a later step m is 1 − 1/m. Multiplying gives

(k/i) · ∏(m=i+1..N) (1 − 1/m) = (k/i)·(i/N) = k/N

so every element ends up in the sample with the same probability k/N. The histogram on the right is the empirical inclusion rate per item over many runs — it converges to the flat dashed line at k/N.