I was looking at this derivation of the bias-variance tradeoff. In the last block of the derivation it is claimed that
$\text{E}[f\hat{f}] + \text{E}[\varepsilon\hat{f}] = \text{E}[f\hat{f}] + \text{E}[\varepsilon]\text{E}[\hat{f}]$
since $\hat{f}$ and $\varepsilon$ are independent. Can someone explain why they are independent? This question has been asked before, and as one comment points out, it doesn't make a lot of sense intuitively:
Intuitively, $\varepsilon$ is the irreducible error due to e.g. measurement errors, which is independent from the model estimates $\hat{f}$. However, in practice, the level of noise in training data determines the data quality, which in turn impose huge impact on getting $\hat{f}$.
Thanks a lot!