🌸 Bloom Filter
Probabilistic set membership
No false negatives
Bits set: 0 / 64
Item
Parameters
Controls
Stats
Inserted n
0
Bits set
0 / 64
Theory FP
0.0%
Measured FP
Log
Info & Theory

A Bloom filter is a compact probabilistic structure that answers one question: "is this item in the set?" It uses an array of m bits and k hash functions, storing no items at all.

Adding an item

Hash the item with all k functions to get k positions in [0, m), then set those bits to 1. Bits are never cleared.

Querying an item

  • If any of the k bits is 0 → definitely not in the set.
  • If all k bits are 1 → possibly in the set.

Because set bits are never reset, a true member always passes — so there are no false negatives. But collisions can make a non-member look present: a false positive.

The error rate

After inserting n items, the probability a random non-member yields all-ones is approximately (1 − e^(−kn/m))^k. The term e^(−kn/m) estimates the share of bits still 0.

Tuning k and m

The error is minimised at k = (m/n)·ln 2, and a target rate p needs m = −n·ln p / (ln 2)² bits — about 9.6 bits per item for 1%.

Frequently asked questions

What is a Bloom filter?

A Bloom filter is a space-efficient probabilistic data structure that tests whether an element is a member of a set. It can report "possibly in set" or "definitely not in set", using a bit array and several hash functions instead of storing the items themselves.

Why does a Bloom filter never give false negatives?

When an item is inserted, all k of its hashed bit positions are set to 1 and are never cleared. So if any of an item's k bits is 0, the item was definitely never added — there are no false negatives.

What causes false positives in a Bloom filter?

A false positive happens when all k bits for a queried item are already set to 1 by other inserted items, even though the item itself was never added. The probability rises as the filter fills.

What is the false-positive probability formula?

After inserting n items into m bits with k hash functions, the approximate false-positive rate is (1 − e^(−kn/m))^k. The term e^(−kn/m) estimates the fraction of bits still 0.

How do I choose k, the number of hash functions?

The optimal number is k = (m/n) · ln 2, which minimises the false-positive rate. Too few hashes under-discriminate; too many fill the array quickly.

How big should the bit array m be?

For a target false-positive rate p and n items, the optimal size is m = −(n · ln p) / (ln 2)² bits, roughly 9.6 bits per element for a 1% error rate.

Can you delete items from a Bloom filter?

Not from a standard Bloom filter, because clearing bits could affect other items. A Counting Bloom filter uses small counters instead of single bits to allow deletions.

Where are Bloom filters used in practice?

Databases like Cassandra and HBase use them to skip disk reads for missing keys, browsers use them for malicious-URL checks, and CDNs and caches use them to avoid one-hit-wonder caching.

How do real implementations get k independent hashes?

Rather than k separate functions, double hashing combines two base hashes as h_i(x) = h1(x) + i · h2(x) mod m, which behaves close to k independent hashes for filter purposes.

What is the difference between measured and theoretical error?

The theoretical rate (1 − e^(−kn/m))^k assumes idealised uniform hashing. The measured rate here counts actual false positives over random test queries, so it fluctuates around the theory.