Proof of Variance of the Irreducible Error

1.2k Views Asked by At

In Introduction to Statistical Learning, given the general form of a quantitative response between a set of predictor variables and a target variable

$$Y=f(X)+\epsilon$$

and the general form for a prediction over the same predictors and target

$$\hat{Y}=\hat{f}(X)$$

the authors draw what is referred to as a simple conclusion with regard to the expected value of the squared difference between the predicted and actual value of $Y$ and the variance associated with the error term, $\epsilon$, that is, given an estimate $\hat{f}$ and a set of predictors $X$,

\begin{align*} \mathbb{E}\left[(Y-\hat Y)^2\right] &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right)^2\right] \\ &=\left[f(X)-\hat{f}(X)\right]^2 + \text{Var}(\epsilon) \end{align*}

I am having trouble completing this proof and am hoping for some assistance filling in the blanks.

My work thus far: \begin{align*} \mathbb{E}\left[(Y-\hat Y)^2\right] &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right)^2\right] \\ &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right) \left(f(X)+\epsilon-\hat{f}(X)\right)\right] \\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right) \left(f(X)+\epsilon-\hat{f}(X)\right) +\epsilon \left(f(X)+\epsilon-\hat{f}(X)\right)\right] \\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right)^2 +\epsilon \left(f(X)-\hat{f}(X)\right) +\epsilon \left(f(X)-\hat{f}(X)\right) +\epsilon^2\right] \\ \text{Because the expectation is linear}&\\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right)^2\right] +\mathbb{E}\left[\epsilon^2+ 2\epsilon \left(f(X)-\hat{f}(X)\right)\right] \\ \text{Because $f$ and $\hat{f}$ are constant}&\\ &=\left[f(X)-\hat{f}(X)\right]^2 +\mathbb{E}\left[\epsilon^2+ 2\epsilon \left(f(X)-\hat{f}(X)\right)\right] \\ \end{align*}

And this is as far as I get. According to the final result

$$\text{Var}(\epsilon)=\mathbb{E}\left[\epsilon^2+ 2\epsilon \left(f(X)-\hat{f}(X)\right)\right]$$

but I can not see how to make this work.

2

There are 2 best solutions below

0
On BEST ANSWER

The proof can be greatly simplified if we consider that for a random variable $W$ and a constant $c$, $$\operatorname{E}[(W+c)^2] = \operatorname{E}[W^2 + 2cW + c^2] = \operatorname{E}[W^2] + 2c \operatorname{E}[W] + c^2.$$ Consequently, if $\operatorname{E}[W] = 0$, then $$\operatorname{E}[W^2] = \operatorname{Var}[W],$$ hence $$\operatorname{E}[(W+c)^2] = \operatorname{Var}[W] + c^2.$$ Then with the choice $W = \epsilon$, $c = f(X) - \hat f(X)$, the result follows.


Alternatively, we can consider the transformed variable $U = W + c$, which clearly satisfies $$\operatorname{Var}[U] = \operatorname{Var}[W].$$ Thus $$\operatorname{E}[U^2] = \operatorname{Var}[U] + \operatorname{E}[U]^2 = \operatorname{Var}[W] + (\operatorname{E}[W] + c)^2,$$ and again, if $\operatorname{E}[W] = 0$, we obtain the identity $$\operatorname{E}[(W+c)^2] = \operatorname{Var}[W] + c^2.$$

0
On

Asked here. Not answered but comments point at two facts that lead me to the following solution.

In comments:

  1. Because the mean of $\epsilon$ is zero
  2. Because the variance of $\epsilon$ is $\mathbb{E}(\epsilon^2)$

Completed Proof:

\begin{align*} \mathbb{E}\left[(Y-\hat Y)^2\right] &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right)^2\right] \\ &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right) \left(f(X)+\epsilon-\hat{f}(X)\right)\right] \\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right) \left(f(X)+\epsilon-\hat{f}(X)\right) +\epsilon \left(f(X)+\epsilon-\hat{f}(X)\right)\right] \\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right)^2 +\epsilon \left(f(X)-\hat{f}(X)\right) +\epsilon \left(f(X)-\hat{f}(X)\right) +\epsilon^2\right] \\ \text{Because the expectation is linear}&\\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right)^2\right] +\mathbb{E}\left[\epsilon^2\right] +2\mathbb{E}\left[\epsilon \left(f(X)-\hat{f}(X)\right)\right] \\ \text{Because $f$ and $\hat{f}$ are constant}&\\ &=\left[f(X)-\hat{f}(X)\right]^2 +\mathbb{E}\left[\epsilon^2\right] +2\mathbb{E}\left[\epsilon \left(f(X)-\hat{f}(X)\right)\right] \\ \text{Because the mean of $\epsilon$ is zero}&\\ &=\left[f(X)-\hat{f}(X)\right]^2 +\mathbb{E}\left[\epsilon^2\right] \\ \text{Because the variance of $\epsilon$ is $\mathbb{E}(\epsilon^2)$}&\\ &=\left[f(X)-\hat{f}(X)\right]^2 + \text{Var}(\epsilon) \end{align*}