Eroxl's Notes
Disjoint Set (Data Structure)
aliases
union-find

A disjoint set, is a data structure that keeps track of a partition of a set into disjoint subsets. It defines three primary operations:

  • MakeSet: Which creates a new subset containing a single element.
  • Find: Which determines which subset a particular element is in. This can be used to determine if two elements are in the same subset.
  • Union: Which merges two subsets into a single subset.

Abstract Data Type

template <class T>
class DisjointSet {
    public:
        DisjointSet();

        // Create a new set with a single element
        void makeSet(const T& element);

        // Find the representative of the set containing the element
        T find(const T& element);

        // Union the sets containing element1 and element2
        void unionSets(const T& element1, const T& element2);
}

Implementation

Forest of Trees

A common implementation of the disjoint set data structure is using a forest of trees. Each tree represents a subset, and the root of the tree is the representative of that subset. The Find operation traverses up the tree to find the root, while the Union operation links the roots of two trees together.

Optimizations

To improve the efficiency of the disjoint set operations, two optimizations are commonly used:

  • Path Compression: During the Find operation, the nodes visited on the path to the root are directly connected to the root. This flattens the structure of the tree, speeding up future operations.
  • Union by Height: When performing the Union operation, the tree with the smaller height is made a subtree of the tree with the larger height. This helps keep the trees balanced, reducing the time complexity of the operations.

With these optimizations, the amortized time complexity for each operation is nearly constant, specifically , where is the inverse Ackermann function which grows very slowly (practically constant for all reasonable values of ).