Regarding the proof of James-Stein estimator

313 Views Asked by At

I'm currently struggling to understand the james stein estimator.

For $N \ge 3$ and the James-Stein estimator $\hat \mu^{JS} = (1-\frac{N-2}{\sum z_i^2})z$, where $z \sim N_N (\mu , I)$, $$E[\Vert \hat \mu^{JS} - \mu\Vert^2] < E[\Vert \hat \mu^{MLE} - \mu\Vert^2]$$ for every $\mu$

The proof starts with the following expression with an arbitrary estimator $\hat \mu$. ($\hat \mu$ is not necessarily unbiased estimator of $\mu$. Also, $\mu_i$ is the ith element of the vector $\mu$.) Then,

$$(\hat \mu_i - \mu_i)^2 = (z_i - \hat \mu_i)^2 - (z_i - \mu_i)^2 + 2(\hat \mu_i - \mu_i)(z_i - \mu_i)$$ Taking expextation over all the form, $$E[\Vert \hat \mu - \mu\Vert^2] = E[\Vert z - \mu\Vert^2] - N + 2\sum_{i = 1}^N Cov[\hat \mu_i, z_i]$$ (the same formula appears in this site https://math.stackexchange.com/a/4035216/1118138)

(1) The first uncertain point is that the proof says $Cov[\hat \mu_i, z_i] = E[\frac{\partial \hat \mu_i}{\partial z_i}]$ without proof. I have a difficulty associating the two expressions.

(2) Using the property in (1), $$E[\Vert \hat \mu - \mu\Vert^2] = E[\Vert z - \mu\Vert^2] - N + 2\sum_{i = 1}^N E[\frac{\partial \hat \mu_i}{\partial z_i}]$$

and concludes the proof by switching $\hat \mu$ to $\hat \mu ^{JS}$as follows; $$E[\Vert \hat \mu^{JS} - \mu\Vert^2] = N - E[\frac{(N-2)^2}{\sum z_i^2}]$$

If the above expression is correct, $\sum _{i=1} ^{N}E[\frac{\partial \hat \mu_i}{\partial z_i}] = N - E[\frac{(N-2)^2}{\sum z_i^2}]$ should hold, which is not sure for now.

(3) (looked simple but a little confusing)

The last expression goes to $E[\Vert \hat \mu^{JS} - \mu\Vert^2] < N$, but how does this lead to the conclusion $E[\Vert \hat \mu^{JS} - \mu\Vert^2] < E[\Vert \hat \mu^{MLE} - \mu\Vert^2]$? If $E[\Vert \hat \mu^{MLE} - \mu\Vert^2] = N$ is the intention of the proof, how can I show this?

Any help about this proof would be grateful. Thank you.

1

There are 1 best solutions below

3
On BEST ANSWER

So the first thing is that $\mu_i$ is the $i$th component of $\mu$ - it is a function of the parameter rather than any kind of estimator.

As for your other querstions:

  1. This is the crux of Stein's unbiased risk estimate. It relies heavily on the fact that $z_i$ is normally distributed. Stein's important observation was that the standard normal density $\varphi$ satisfies $\varphi'(x) = -x\varphi(x)$.

    Since each $z_i$ is independent with density $\varphi(z_i - \mu_i)$, this means that we can integrate by parts in the following integral: \begin{align*} \mathrm{Cov}(\hat{\mu}_i(z), z_i) &= \mathbf{E} \hat{\mu}_i(z)(z_i - \mu_i) \\ &= \int \hat{\mu}_i(z)(z_i - \mu_i) \prod_{j=1}^n \varphi(z_j - \mu_j)\; \mathrm{d}z \\ &= \int \hat{\mu}_i(z) \cdot \Bigl[-\frac{\partial}{\partial z_i} \prod_{j=1}^n \varphi(z_j - \mu_j)\Bigr] \mathrm{d}z \\ &= \int \frac{\partial}{\partial z_i} \hat{\mu}_i(z) \cdot \prod_{j=1} \varphi(z_j - \mu_j) \mathrm{d}z \\ &= \mathbf{E} \frac{\partial \hat{\mu}_i}{\partial z_i}. \end{align*}

    I want to emphasise that we can only do this integration by parts because of the nice property of the normal density, and this doesn't hold at all for other distributions.

  2. That's correct, we can compute the derivative directly: \begin{align*} \frac{\partial \hat{\mu}^{\mathrm{JS}}_i}{\partial z_i} &= \frac{\partial}{\partial z_i} \Bigl(1 - \frac{N-2}{\sum z_j^2}\Bigr) z_i \\ &= \frac{N-2}{(\sum z_j^2)^2} \cdot 2 z_i^2 + \Bigl(1 - \frac{N-2}{\sum z_j^2}\Bigr) \\ &= 1 - (N-2) \frac{\sum z_j^2 - 2z_i^2}{(\sum z_j^2)^2}, \end{align*} so that \begin{align*} \sum_{i=1}^N \frac{\partial \hat{\mu}^{\mathrm{JS}}_i}{\partial z_i} = N - (N-2) \sum_{i=1}^N \frac{\sum z_j^2 - 2z_i^2}{(\sum z_j^2)^2} = N - \frac{(N-2)^2}{\sum z_j^2}. \end{align*}

  3. This last part just comes from the fact that $\hat{\mu}^{\mathrm{MLE}} = Z$, and so \begin{align*} \mathbf{E}\lVert \hat{\mu}^{\mathrm{MLE}} - \mu \rVert^2 = \sum_{i=1}^N \mathbf{E}(Z_i - \mu_i)^2 = N. \end{align*}