Consider the EM algorithm of a Gaussian mixture model $$ p(\mathbf{x})=\sum_{k=1}^{K} \pi_{k} \mathcal{N}\left(\mathbf{x} \mid \mu_{k}, \boldsymbol{\Sigma}_{k}\right) $$ Assume that $\Sigma_{k}=\epsilon I$ for all $k=1, \cdots, K$. Letting $\epsilon \rightarrow 0$, prove that the limiting case is equivalent to the $K$-means clustering.
According to several internet resources, in order to prove how the limiting case turns out to be $K$-means clustering method, we have to use responsibilities. The EM algorithm uses responsibilities to make a soft assignment of each data point to one of the clusters. When $\sigma$ is fixed, responsibility of data point $i$ assigning to cluster $k$ is given by $$ r_{i}^{(k)}=\frac{\exp \left(-\frac{1}{2 \sigma^{2}}\left\|y_{i}-\mu_{k}\right\|^{2}\right)}{\sum_{l=1}^{K} \exp \left(-\frac{1}{2 \sigma^{2}}\left\|y_{i}-\mu_{l}\right\|^{2}\right)} $$
Considering those terminologies, how can one show that $\sigma \rightarrow 0$ yields $r_{i}^{(k)} \rightarrow 1$ for the cluster $k$ that is closest to $y_{i}$ and $r_{i}^{(k)} \rightarrow 0$ for other clusters? How is it possible to recover $K$-means approach using those responsibilities, and how the usage of them could help us for the limiting case?
Note $$ r^{(k)}_i = \frac{\exp(-\frac{1}{2\sigma^2} \|y_i - \mu_k\|^2)}{\sum_{l=1}^K \exp(-\frac{1}{2\sigma^2} \|y_i - \mu_l\|^2)} = \frac{1}{\sum_{l=1}^K \exp(-\frac{1}{2\sigma^2}(\|y_i - \mu_l\|^2 - \|y_i - \mu_k\|^2))} $$
and that $$ \exp\left(-\frac{1}{2\sigma^2}(\|y_i - \mu_l\|^2 - \|y_i - \mu_k\|^2)\right) \to \begin{cases} 1 & \|y_i - \mu_l\| = \|y_i - \mu_k\| \\ 0 & \|y_i-\mu_l\| > \|y_i - \mu_k\| \\ \infty & \|y_i - \mu_l \| < \|y_i - \mu_k\| \end{cases} $$
So for instance if $1$ is the [unique] closest cluster to $y_i$, then the denominator of $r^{(1)}_i$ will have all terms tending to $0$, except for one term tending to $1$, so the limit is $\frac{1}{1 + 0 + 0 + \cdots + 0+ 0}=1$, while the denominator of $r^{(k)}_i$ for $k \ne 1$ will have at least one term ending to $\infty$, making the limit $0$ instead.
Note that if there are multiple "closest clusters" to a point $y_i$, then the limiting probability will be uniform among these clusters.