Info & Theory
Union-Find (disjoint set union, DSU) keeps a collection of non-overlapping sets and answers, in near-constant time, whether two elements belong to the same set.
The two operations
-
find(x)follows parent pointers up to the root that represents x's set. -
union(a, b)finds both roots and links one under the other, merging the two sets into one.
Union by rank
Each tree carries a rank — an upper bound on its
height. We attach the smaller-rank root under the larger-rank
root, so the tree stays shallow. On a tie, one root becomes the
child and the survivor's rank rises by 1.
Path compression
During find, after reaching the root we re-point
every node on the path directly at the root. The tree
flattens, so later queries on those nodes are almost
instant. This simulation highlights the traversed path, then
redraws with the compressed pointers.
Why it is almost O(1)
With both tricks, m operations on n
elements cost O(m·α(n)), where α is
the inverse Ackermann function. Since α(n) ≤ 4
for any conceivable input, each operation is effectively
constant time.
Where it is used
- Counting connected components as edges are added.
- Kruskal's minimum spanning tree (cycle detection).
- Randomised maze generation via Kruskal's method.
- Image segmentation, percolation, and dynamic connectivity.
Frequently asked questions
What is a union-find data structure?
Union-find, also called disjoint set union (DSU), maintains a collection of non-overlapping sets. It supports two core operations: find, which returns a representative (root) of the set containing an element, and union, which merges the two sets containing two elements. Two elements are in the same set exactly when they share the same root.
How is union-find represented internally?
Each set is stored as a rooted tree inside a single parent[] array. Every element points to its parent, and a root points to itself. The whole structure is therefore a forest of trees, one tree per disjoint set. To test connectivity you walk each element up to its root and compare the two roots.
What does union by rank do?
Union by rank attaches the shorter tree under the root of the taller tree, keeping trees shallow. Rank is an upper bound on a tree's height. When two roots have equal rank, one becomes the child of the other and the surviving root's rank increases by one. This stops the forest from degenerating into a long chain.
What is path compression?
Path compression is applied during find: after locating the root, every node visited on the way is re-pointed directly at that root. This flattens the tree so future queries on those nodes are almost instant. The simulation animates this by highlighting the traversed path, then redrawing with the compressed pointers.
Why is union-find almost O(1) per operation?
With both union by rank and path compression, a sequence of m operations on n elements runs in O(m·α(n)) time, where α is the inverse Ackermann function. α(n) grows so slowly that it is below 5 for any practical n, so each operation is effectively constant time even though it is not strictly O(1).
What is the inverse Ackermann function α(n)?
The Ackermann function grows astronomically fast, so its inverse α(n) grows astronomically slowly. For every input size that could fit in the observable universe, α(n) is at most 4. That is why the amortised cost of union-find with both optimisations is treated as essentially constant in practice.
How does union-find count connected components?
The number of disjoint sets equals the number of roots in the forest. Start with n singleton sets, so n components. Each successful union of two different sets reduces the component count by exactly one. This makes union-find an efficient way to track connectivity in a graph as edges are added.
How is union-find used in Kruskal's MST algorithm?
Kruskal's minimum spanning tree algorithm sorts edges by weight and adds each edge only if its endpoints are in different sets, which is checked with find. Adding the edge performs a union. Union-find makes this cycle detection nearly constant time, which is why Kruskal runs in O(E log E) dominated by the sort.
Can union-find help generate mazes?
Yes. A randomised version of Kruskal's algorithm builds perfect mazes: start with every cell as its own set and walls everywhere, then repeatedly knock down a random wall only if the two cells it separates are in different sets, unioning them. This guarantees a fully connected maze with no loops.
What happens if you union two elements already in the same set?
If find(a) and find(b) return the same root, the elements are already connected, so union does nothing structurally and the component count is unchanged. A robust implementation detects this early and skips the rank update, avoiding redundant work while keeping the forest correct.