Question 1

What is a union-find data structure?

Accepted Answer

Union-find, also called disjoint set union (DSU), maintains a collection of non-overlapping sets. It supports two core operations: find, which returns a representative (root) of the set containing an element, and union, which merges the two sets containing two elements. Two elements are in the same set exactly when they share the same root.

Question 2

How is union-find represented internally?

Accepted Answer

Each set is stored as a rooted tree inside a single parent[] array. Every element points to its parent, and a root points to itself. The whole structure is therefore a forest of trees, one tree per disjoint set. To test connectivity you walk each element up to its root and compare the two roots.

Question 3

What does union by rank do?

Accepted Answer

Union by rank attaches the shorter tree under the root of the taller tree, keeping trees shallow. Rank is an upper bound on a tree's height. When two roots have equal rank, one becomes the child of the other and the surviving root's rank increases by one. This stops the forest from degenerating into a long chain.

Question 4

What is path compression?

Accepted Answer

Path compression is applied during find: after locating the root, every node visited on the way is re-pointed directly at that root. This flattens the tree so future queries on those nodes are almost instant. The simulation animates this by highlighting the traversed path, then redrawing with the compressed pointers.

Question 5

Why is union-find almost O(1) per operation?

Accepted Answer

With both union by rank and path compression, a sequence of m operations on n elements runs in O(m·α(n)) time, where α is the inverse Ackermann function. α(n) grows so slowly that it is below 5 for any practical n, so each operation is effectively constant time even though it is not strictly O(1).

Question 6

What is the inverse Ackermann function α(n)?

Accepted Answer

The Ackermann function grows astronomically fast, so its inverse α(n) grows astronomically slowly. For every input size that could fit in the observable universe, α(n) is at most 4. That is why the amortised cost of union-find with both optimisations is treated as essentially constant in practice.

Question 7

How does union-find count connected components?

Accepted Answer

The number of disjoint sets equals the number of roots in the forest. Start with n singleton sets, so n components. Each successful union of two different sets reduces the component count by exactly one. This makes union-find an efficient way to track connectivity in a graph as edges are added.

Question 8

How is union-find used in Kruskal's MST algorithm?

Accepted Answer

Kruskal's minimum spanning tree algorithm sorts edges by weight and adds each edge only if its endpoints are in different sets, which is checked with find. Adding the edge performs a union. Union-find makes this cycle detection nearly constant time, which is why Kruskal runs in O(E log E) dominated by the sort.

Question 9

Can union-find help generate mazes?

Accepted Answer

Yes. A randomised version of Kruskal's algorithm builds perfect mazes: start with every cell as its own set and walls everywhere, then repeatedly knock down a random wall only if the two cells it separates are in different sets, unioning them. This guarantees a fully connected maze with no loops.

Question 10

What happens if you union two elements already in the same set?

Accepted Answer

If find(a) and find(b) return the same root, the elements are already connected, so union does nothing structurally and the component count is unchanged. A robust implementation detects this early and skips the rank update, avoiding redundant work while keeping the forest correct.

The two operations

Union by rank

Path compression

Why it is almost O(1)

Where it is used

Frequently asked questions