Kruskal's algorithm proof using induction on number of nodes

383 Views Asked by At

I wanted to prove Kruskal's algorithm for simple connected undirected graph having pairwise distinct integral edge weights, produces MST using induction on number of nodes. The base case and induction hypothesis are clear.

Base Case: When there are just 2 nodes, since the graph is simple and connected there is just one edge and the prim's algorithm includes that edge and it is the MST.

Induction Hypothesis: Assume that Kruskal's algorithm correctly computes the MST for all connected undirected positive edge weighted simple graphs with $k$ nodes, where $k < n$.

Induction Step: Consider a connected undirected positive edge weighted simple graph $G$ with n nodes. We have to prove that Kruskal's algorithm for this produces a MST.

I'm confused about how to proceed with the proof. I can see that we should do something like removing a vertex so that number of vertices become $<n$ and we can use the induction hypothesis in some way. But on removing we might get many different connected components. Any help on this would be highly appreciated.

2

There are 2 best solutions below

1
On BEST ANSWER

First, as all weights in graph $G$ are different, the MST is unique. (Proof is everywhere, and notably on Wikipedia's page, I feel it does not add much value to copy it here).

The minimum cost edge $e$ of $G$ is in $G$ MST: take a spanning tree $T$ which does not include $e$. Then add $e$. As it connects two vertices that are already linked in $T$, this creates a cycle. Therefore it is possible to delete another edge of this cycle: this creates a spanning tree which has a lower total cost than $T$. Hence the MST must include $e$.

Now, let's make a graph $G'$ by fusing the two vertices $v_1$ and $v_2$ linked to the minimum cost edge $e$, and modifying the edges in the following way: if a vertex $f$ is connected to both $v_1$ and $v_2$, keep only the edge with the minimum cost. If $G$ MST contains one of these edges, it necessarily contains the one we have kept.

Let's prove that $G$ MST is $G'$ MST plus $e$. We already know that $G$ MST includes $e$, and that all the other edges it includes are in $G'$ with the same weight. Hence when searching all trees in $G$ for the MST, we can reduce the search to the set $S$ of trees which are made of $e$ plus a graph in $G'$, and this graph has to be a tree. So $S$ is in bijection with the set $S'$ of trees in $G'$. The cost of a tree in $S$ is the cost of the corresponding tree in $S'$, plus $e$ cost; hence the minimum trees of $S$ and $S'$ are in correspondence. This proves that $G$ MST is $G'$ MST plus $e$.

$G'$ has $n-1$ vertices, so we can apply the induction hypothesis: $G'$ MST is built with Kruskal algorithm.

The last step is to prove that "classical Kruskal" algorithm on $G$ is equivalent (in terms of chosen edges) to "Kruskal with fusion" algorithm, i.e. selecting the minimum cost edge $e$ in $G$, fusing vertices $v_1$ and $v_2$, then using Kruskal with fusion on $G'$:

  • During classical Kruskal algorithm on $G$, we would never choose an edge that connects vertices that have been fused in Kruskal with fusion: this would create a cycle. (In fact Kruskal algorithm always chooses an edge between two disconnected trees, and in Kruskal with fusion each of those trees is represented by only one vertex).
  • Amongst the other edges, classical Kruskal will choose the minimum cost one. Kruskal with fusion will choose the minimum cost amongst those remaining, but as we have only suppressed edges that have a greater cost than another edge, both algorithms select the same edge.
3
On

Removing a vertex doesn't seem to be a good idea, if this vertex is not a leaf of a MST. But you don't know whether a vertex may be a leaf of a MST until you prove it.

It is much better to consider the last edge $\{\,x, y\,\}$ you add to the tree. It connects two trees $T_1$ and $T_2$ on $|T_1| < n$ and $|T_2| < n$ vertices ($|T_1| + |T_2| = n$). On one hand these trees are MSTs in the corresponding subgraphs $G(V(T_1))$ and $G(V(T_2))$ of the graph $G$. On the other hand you add the most lightweight edge between them, but this edge is heavier than any edge of $T_1 \cup T_2$.

Suppose that some MST contains some other edge $\{\,u, v\,\}$. If $\{\,u, v\,\} \subseteq V(T_1)$, then $T_1$ was not MST of $G(V(T_1))$ that contradicts the induction hypothesis (and the same for $T_2$). If $u \in V(T_1)$ and $v \in V(T_2)$ (or vice versa) then this edge can be replaced either with $\{\,x, y\,\}$ or with some edge of $T_1$ or $T_2$. In any case such substitution is more lightweight, because $\{\,x, y\,\}$ is the most lightweight edge of all edges between vertices of $T_1$ and $T_2$, but at the same time each edge in $T_1$ and $T_2$ is even more lightweight that $\{\,x, y\,\}$.

P. S. Base case should be the graph $K_1$, otherwise you need to consider this case separately anyway and also make induction step proof a bit longer.