MSE of an estimator as sum of bias and variance

405 Views Asked by At

I am reading that how the MSE of an estimator $\hat{\theta}$ of $\theta$ can be expressed as $E(\hat{\theta} - \theta)^2$. Then this can be further simplified to $ (E[\hat{\theta}] - \theta)^2 + Var[\hat{\theta}],$ where the first term is the square of the bias.

I am trying to understand that what the expectation is over what. An elaborate explanation over what is happening here or an appropriate link will be very welcome.

2

There are 2 best solutions below

1
On

Extended comment: Illustrating MSE, Variance, and Bias

Here is the simplest non-trivial example I can think of immediately. Suppose $X_1, \dots, X_5$ are iid $Unif(0, \theta).$ The MLE of $\theta$ is $\hat \theta = \max(X_i).$ However $E(\hat \theta) = (5/6)\theta,$ so $\hat \theta$ is biased for $\theta.$

In this simple case, you should be able to find $E(\hat \theta - \theta)$ and $Var(\hat \theta)$ analytically to verify the relationship between MSE, bias, and variance. I'm hoping that doing that simple exercise will clarify the issues for you. I'm not quite sure whether my brief Comment above is enough.

Here is a brief simulation in R of 100,000 observations of $\hat \theta,$ where $n = 5$ and $\theta = 10.$ (Of course, $\hat \theta$ can be made unbiased by multiplying it by $6/5,$ but we do not include that in the simulation.)

 m = 100000;  th = 10;  n = 5
 x = runif(m*n, 0, th)
 DTA = matrix(x, nrow=m)      # each row a sample of size 5
 th.hat = apply(DTA, 1, max)  # max of each sample
 mean(th.hat)                 # approx E(th.hat)
 ##  8.332366                 #    exact is 8.3333
 var(th.hat)                  # approx of V(th.hat)
 ## 1.988645                  #    exact is 2
 mean((th.hat - th)^2)        # approx MSE(th.hat)
 ## 4.769627
 mean((mean(th.hat) - th)^2)  # approx squared bias
 ## 2.781003

Also, notice that double the sample mean of the five observations, call it $\tilde \theta,$ is an unbiased estimator, but with considerably higher MSE than $\hat \theta.$ Speaking to the point of your question, $\tilde \theta$ has a different support than does $\hat \theta$. (However, the PDF of $\tilde \theta$ is messy; you can get $E(\tilde \theta)$ and $V(\tilde \theta)$ using rules for mean and variance of $\bar X.$)

 th.unb = 2*rowMeans(DTA)
 mean(th.unb)                 # approx E(th.unb)
 ## 10.00675                  #   exact is 10
 var(th.unb)                  # approx V(th.unb)
 ## 6.695437                  #   exact is 400/60 = 6.6667
 mean((th.unb - th)^2)        # approx MSE
 ## 6.695416                  # for unb est: MSE = Var

The figure below plots these simulated estimators using the same scale for easy comparison of their approximate distributions.

enter image description here

0
On

Simplification $$\begin{align*}{\rm MSE}[\hat \theta] &= {\rm E}[(\hat\theta - \theta)^2] \\& ={\rm E}[\hat \theta^2 - 2 \theta \hat \theta + \theta^2]\\& ={\rm E}[\hat \theta^2] - 2 \theta {\rm E}[\hat \theta] + \theta^2 \\ &={\rm E}[\hat \theta^2] - {\rm E}[\hat \theta]^2 + ({\rm E}[\hat \theta] - \theta)^2 \\ &= {\rm Var}[\hat \theta] + (E[\hat{\theta}] - \theta)^2 \end{align*}$$