This question is in reference to Exercise 4.3 in the 'learning from data' book. Here is the question where H is the hypothesis set and f is the target function.
Deterministic noise depends on H, as some models approximate f better
than others.
(a) Assume H is fixed and we increase the complexity of f. Will deterministic noise in general go up or down? Is there a higher or lower tendency to overfit?
(b) Assume f is fixed and we decrease the complexity of H. Will deterministic noise in general go up or down? Is there a higher or lower tendency to overfit?
[Hint: There is a race between two factors that affect overfitting in opposite ways, but one wins.]
This is what I think:
a) By making the target function more complex and keeping H fixed then deterministic noise will increase. Also, H will overfit less so than before since H is even more general on f than it previously was.
b) By decreasing the complexity of H while keeping f fixed then again deterministic noise will increase and again H will overfit less than before since H is even more general on f that it previously was.
The hint to me implies that there may be a more nuanced answer than what I have. Would you please let me know how to go about thinking about this? The reading was pretty informal with regards to this topic and in the AML book forum there aren't any posts covering this topic. Any insights are much appreciated.
I will take a shot on this question. In this example, the first concept is that both stochastic noise and deterministic noise are, as their name suggests, noises, that is, the part of the observed data $y$ that we cannot model.
In the case of stochastic noise, this is due to the fact that a noise adds up to the true $y$. We would like to learn from $y_n=f(x_n)$, but we observe $y_n=f(x_n) + \textrm{stochastic noise}$.
In the case of deterministic noise (suppose no stochastic noise), we have a model $h^*$ that is simpler that the true function $f$. So, any part of the data that $h^*$ is not able to to explain, is interpreted as noise by $h^*$. We would like to learn from $y_n=h^*(x_n)$ (this is what I am able to explain, to model), but we observe $y_n=f(x_n)=h^*(x_n)+\textrm{deterministic noise}$. The deterministic noise is therefore related to model bias.
In both cases, we cannot model precisely the data $y$ since there is a noise term. The learning model is not able to discern the two type of noise, it thinks only to fit the data. On a finite dataset $N$, this implies that we can end up with a bad function when noise is present. In the case of stochastic noise, this is obvious: we fit the noise and not the real $y$. In the case of deterministic noise, it is possible to think at it as if there are more parts of $f$ the we cannot understand, and so there is more possibility that I understand them in a wrong way. Observing only a finite number of points from a highly complex target makes your simple hypothesis to blindy follow them, not keeping into consideration the unknown parts of the domain, leading to overfitting.
Answering your questions:
(a) Assume H is fixed and we increase the complexity of f. Will deterministic noise in general go up or down? Is there a higher or lower tendency to overfit?
Here deterministic noise will go up since there are more parts of $f$ that H cannot explain. Overfitting therefore goes up.
(b) Assume f is fixed and we decrease the complexity of H. Will deterministic noise in general go up or down? Is there a higher or lower tendency to overfit?
Here deterministic noise will go up since there are more parts of $f$ that H cannot explain. Overfitting therefore goes up. However, being H less complex, your model will have less variance and less tendency to overfit. Overfitting will go down. There are therefore 2 competing terms: often, the second term wins over the first, and so, in general, overfitting will go down.
The take home lesson is to match model complexity to the data quantity and not to target complexity