In Introduction to Statistical Learning, given the general form of a quantitative response between a set of predictor variables and a target variable
$$Y=f(X)+\epsilon$$
and the general form for a prediction over the same predictors and target
$$\hat{Y}=\hat{f}(X)$$
the authors draw what is referred to as a simple conclusion with regard to the expected value of the squared difference between the predicted and actual value of $Y$ and the variance associated with the error term, $\epsilon$, that is, given an estimate $\hat{f}$ and a set of predictors $X$,
\begin{align*} \mathbb{E}\left[(Y-\hat Y)^2\right] &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right)^2\right] \\ &=\left[f(X)-\hat{f}(X)\right]^2 + \text{Var}(\epsilon) \end{align*}
I am having trouble completing this proof and am hoping for some assistance filling in the blanks.
My work thus far: \begin{align*} \mathbb{E}\left[(Y-\hat Y)^2\right] &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right)^2\right] \\ &=\mathbb{E}\left[\left(f(X)+\epsilon-\hat{f}(X)\right) \left(f(X)+\epsilon-\hat{f}(X)\right)\right] \\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right) \left(f(X)+\epsilon-\hat{f}(X)\right) +\epsilon \left(f(X)+\epsilon-\hat{f}(X)\right)\right] \\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right)^2 +\epsilon \left(f(X)-\hat{f}(X)\right) +\epsilon \left(f(X)-\hat{f}(X)\right) +\epsilon^2\right] \\ \text{Because the expectation is linear}&\\ &=\mathbb{E}\left[\left(f(X)-\hat{f}(X)\right)^2\right] +\mathbb{E}\left[\epsilon^2+ 2\epsilon \left(f(X)-\hat{f}(X)\right)\right] \\ \text{Because $f$ and $\hat{f}$ are constant}&\\ &=\left[f(X)-\hat{f}(X)\right]^2 +\mathbb{E}\left[\epsilon^2+ 2\epsilon \left(f(X)-\hat{f}(X)\right)\right] \\ \end{align*}
And this is as far as I get. According to the final result
$$\text{Var}(\epsilon)=\mathbb{E}\left[\epsilon^2+ 2\epsilon \left(f(X)-\hat{f}(X)\right)\right]$$
but I can not see how to make this work.
The proof can be greatly simplified if we consider that for a random variable $W$ and a constant $c$, $$\operatorname{E}[(W+c)^2] = \operatorname{E}[W^2 + 2cW + c^2] = \operatorname{E}[W^2] + 2c \operatorname{E}[W] + c^2.$$ Consequently, if $\operatorname{E}[W] = 0$, then $$\operatorname{E}[W^2] = \operatorname{Var}[W],$$ hence $$\operatorname{E}[(W+c)^2] = \operatorname{Var}[W] + c^2.$$ Then with the choice $W = \epsilon$, $c = f(X) - \hat f(X)$, the result follows.
Alternatively, we can consider the transformed variable $U = W + c$, which clearly satisfies $$\operatorname{Var}[U] = \operatorname{Var}[W].$$ Thus $$\operatorname{E}[U^2] = \operatorname{Var}[U] + \operatorname{E}[U]^2 = \operatorname{Var}[W] + (\operatorname{E}[W] + c)^2,$$ and again, if $\operatorname{E}[W] = 0$, we obtain the identity $$\operatorname{E}[(W+c)^2] = \operatorname{Var}[W] + c^2.$$