In this video lecture they say:
If $Y=f(X) + \epsilon$ where $\epsilon$ is the error for input $X$, then expectation value $E[\epsilon]=0$ and $\text{Var}[\epsilon]=\sigma^2$.
Then it is said that
$\text{EPE}(x_0)=E[(Y-f(x_0))^2|X=x_0]$ ----------- $(1)$
$\implies \text{EPE}(x_0)=E[(E[\hat{f}(x_0)]-\hat{f}(x_0))^2|X=x_0] + E[\hat{f}(x_0)-E[Y]]^2+\sigma^2 = \frac{\sigma^2}{k}$ -------- $(2)$
The first term on the right hand side is called the variance and the second term is called the bias.
$\implies \text{EPE}(x_0) = \frac{\sigma^2}{k} + [f(x_0)-E[\frac{1}{k}\sum_{l=1}^{k}f(x_{(l)})]]^2 + \sigma^2$ ---------- $(3)$
Questions:
How do we get to step $(2)$ from step $(1)$? What does $\hat{f}$ mean?
Why is the variance term $\sigma^2/k$ ($k$ I think is the number of sample data points)?
Why does the bias term equal $[f(x_0)-E[\frac{1}{k}\sum_{l=1}^{k}f(x_{(l)})]]^2$ ?
In step 1 we have that $\textrm{ESE} = \textrm{E}[(Y-f(x_0))^2]$. By adding and subtracting $E[\hat{f}(x_0)]$ we obtain:
\begin{equation} \begin{split} \textrm{ESE} & = \textrm{E}[(Y-f(x_0))^2]\\ & = \textrm{E}[(Y-\textrm{E}[\hat{f}(x_0)] + \textrm{E}[\hat{f}(x_0)]- f(x_0))^2] \end{split} \end{equation}
Expanding this equation gives us the outer and inner terms squared, and the cross terms:
\begin{equation} \begin{split} \textrm{ESE} & = \textrm{E}[(Y-f(x_0))^2]\\ & = \textrm{E}[(Y-\textrm{E}[\hat{f}(x_0)])^2] + \textrm{E}[(\textrm{E}[\hat{f}(x_0)]- f(x_0))^2] + 2\textrm{E}[(Y-\textrm{E}[\hat{f}(x_0)])(\textrm{E}[\hat{f}(x_0)]- f(x_0))] \end{split} \end{equation}
Note that all of the values in the second expectation are constants, so we can get rid of the outer expectation and obtain:
\begin{equation} \begin{split} \textrm{ESE} & = \textrm{E}[(Y-\textrm{E}[\hat{f}(x_0)])^2] + (\textrm{E}[\hat{f}(x_0)]- f(x_0))^2 + 2\textrm{E}[(Y-\textrm{E}[\hat{f}(x_0)])(\textrm{E}[\hat{f}(x_0)]- f(x_0))] \end{split} \end{equation}
So we can see that the first term is the variance, the escond term is the bias of the estimate $\hat{f}(x_0)$ squared, and the right term is left to deal with.
If we can show that the right term is $0$, then we are done.
Expanding the righthand term gives us:
\begin{equation} \begin{split} 2\textrm{E}[(Y-\textrm{E}[\hat{f}(x_0)])(\textrm{E}[\hat{f}(x_0)]- f(x_0))] & = 2\textrm{E}[Y\textrm{E}[\hat{f}(x_0)]-\textrm{E}[\hat{f}(x_0)]^2 -\textrm{E}[\hat{f}(x_0)]f(x_0) + Yf(x_0)]\\ &= 2(\textrm{E}[Y]\textrm{E}[\hat{f}(x_0)]-\textrm{E}[\hat{f}(x_0)]^2 -\textrm{E}[\hat{f}(x_0)]f(x_0) + \textrm{E}[Y]f(x_0)) \end{split} \end{equation}
If $\textrm{E}[\hat{f}(x_0)] = f(x_0)$ then these terms cancel and we get just the bias and variance.