Finding connected components of the graph

3.6k Views Asked by At

suppose that I have the following undirected graph enter image description here

with the following adjacency matrix showing if there is an edge between the nodes: \begin{bmatrix} 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 \end{bmatrix}

Now from the above matrix, I want to find the following matrix which shows if there is a path between each node:

\begin{bmatrix} 0 & 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 1\\ 1 & 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 \end{bmatrix}

We can find the second matrix by checking each node one by one (using transitivity rule) My question is if there is an algorithm for finding the second matrix faster. The matrix I am working with is a huge matrix and I am looking for a good way to implement an algorithm to find the second matrix. In other words I am looking for connected components of the graph. I searched and found that one way is to use Laplacian matrix. For example, for the above example Laplacian matrix would be
\begin{bmatrix} 1 & -1 & 0 & 0 & 0 \\ -1 & 2 & 0 & -1 & 0\\ 0 & 0 & 1 & 0 & -1\\ 0 & -1 & 0 & 1 & 0\\ 0 & 0 & -1 & 0 & 1 \end{bmatrix}

Now using this matrix how can I find the connected components? Is there a fastest way to find the connected components?

1

There are 1 best solutions below

1
On

Putting the Laplacian matrix into reduced row-echelon form:

$$ \begin{bmatrix} 1 & 0 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1 & 0 \\ 0 & 0 & 1& 0 & -1 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{bmatrix} $$

allows us to determine the connected components. Take the columns not containing leading ones, and for each of these set the corresponding variable to $1$ and other such variables to zero. For example, setting the variable from the fourth column to $1$ and the variable from the fifth column to zero gives a solution:

$$ \begin{bmatrix} 1 \\ 1 \\ 0 \\ 1 \\ 0 \end{bmatrix} $$

which gives the connected component $\{a,b,d\}$ containing the node $d$ corresponding to that fourth column.

Setting the variable corresponding to the fifth column to $1$ and the variable corresponding to the fourth column to zero gives a different solution:

$$ \begin{bmatrix} 0 \\ 0 \\ 1 \\ 0 \\ 1 \end{bmatrix} $$

which indicates the connected component $\{c,e\}$ containing the node $e$ corresponding to that fifth column.


As @amd suggests in a Comment on the Question, the well-known algorithm (already per Hopcraft and Tarjan, 1973) for finding all connected components (by searching from each not-yet reached vertex for all its reachable nodes) has complexity $O(|V| + |E|)$ where $V$ is the set of vertices and $E$ the set of edges. This gives at worst $O(|V|^2)$ since $|E| \le |V|(|V|-1)/2$.

Any approach based on the $|V|\times |V|$ matrix (whether adjacency or graph-Laplacian) cannot improve on that asymptotically, because scanning such a matrix already takes $O(|V|^2)$ steps. Indeed row-reducing a general $|V|\times |V|$ matrix has $O(|V|^3)$ complexity.

See an earlier SO Question for a discussion of the "path matrix" (as mentioned by amd in a Comment on this answer):

What does it mean by “Path Matrix” and “Transitive Closure” of a graph?