Let $z\sim\mathcal{N}_N(\mu,I)$, the multivariate $N$-dimensional normal with mean $\mu=(\mu_1,\dots,\mu_N)^T$ and the identity matrix $I$ as the covariance matrix. In the textbook that I'm reading, there is a proof for why the James-Stein estimator $\hat{\mu}^{(\text{JS})}=(1-\frac{N-2}{S})z$ outperforms the MLE $\hat{\mu}^{(\text{MLE})}=z$ in terms of total squared error, that is $$\textbf{E}_\mu[\|\hat{\mu}^{(\text{JS})}-\mu\|^2]<\textbf{E}_\mu[\|\hat{\mu}^{(\text{MLE})}-\mu\|^2]$$ for every choice of $\mu$. The textbook is pretty famous and I highly doubt that there is an error in the proof. However, I do not under stand the following step:
It obviously holds that $$(\hat{\mu}_i-\mu_i)^2=(\hat{\mu}_i-z_i)^2-(\mu_i-z_i)^2+2(z_i-\mu_i)(\hat{\mu}_i-\mu_i).$$ Now the text book says that summing over $N$ and taking expectation on both sides gives $$\textbf{E}_\mu[\|\hat{\mu}-\mu\|^2]=\textbf{E}_\mu[\|\hat{\mu}-z\|^2]-N+2\sum_{i=1}^N\textbf{Cov}_\mu[z_i,\hat{\mu}_i]$$
$\hat{\mu}$ is not defined in the textbook. Given the context, I assume that $\hat{\mu}$ is supposed to be an unbiased estimator for $z$ ($\textbf{E}_\mu[\hat{\mu}_i]=z_i$). Otherwise, I do not understand where the covariance term comes from.
Later in the proof, $\hat{\mu}$ ist swapped by $\hat{\mu}^{\text{(JS)}}$. However, $$\textbf{E}_{\mu}[\hat{\mu}^{\text{(JS)}}]=\textbf{E}_{\mu}[(1-\frac{N-2}{S})z_i]=\mu_i-(N-2)\textbf{E}_{\mu}[\frac{z_i}{S}]$$ and $$\textbf{E}_{\mu}[\frac{z_i}{S}]=\textbf{E}_{\mu}[\frac{-(z_i-\mu_i)+\mu_i}{S}]=-\textbf{E}_{\mu}[\frac{z_i}{S}]+2\mu_i\textbf{E}_\mu[\frac{1}{S}]$$ by symmetry. Since the last term is positive, $\hat{\mu}^{\text{(JS)}}$ is only an unbiased estimator if $\mu=0$, but not for every choice of $\mu$.
I tried to figure out where I went wrong, but I am lost. I would appreciate any help.
For any (square-integrable) estimator $\hat{\mu}$ of $\mu$, \begin{align} \mathsf{E}[\hat{\mu}_i-\mu_i]^2&=\mathsf{E}[\hat{\mu}_i-z_i]^2+\mathsf{E}[z_i-\mu_i]^2+2\mathsf{E}[(z_i-\mu_i)(\hat{\mu}_i-z_i)] \\ &=\mathsf{E}[\hat{\mu}_i-z_i]^2-1+2\operatorname{Cov}(z_i,\hat{\mu}_i) \end{align} because \begin{align} \mathsf{E}[(z_i-\mu_i)(\hat{\mu}_i-z_i)]&=\mathsf{E}[(z_i-\mu_i)(\hat{\mu}_i-z_i)]-\underbrace{\mathsf{E}[z_i-\mu_i]}_{=0}\mathsf{E}[\hat{\mu}_i-z_i] \\ &=\mathsf{E}[(z_i-\mu_i)(\hat{\mu}_i-z_i-\mathsf{E}[\hat{\mu}_i-z_i])] = \\ &=\operatorname{Cov}(z_i,\hat{\mu}_i-z_i)=\operatorname{Cov}(z_i,\hat{\mu}_i)-\operatorname{Var}(z_i). \end{align} Therefore, $$ \mathsf{E}\|\hat{\mu}-\mu\|^2=\mathsf{E}\|\hat{\mu}-z\|^2-N+2\sum_{i=1}^N\operatorname{Cov}(z_i,\hat{\mu}_i). $$