For simplicity, assume we are working with simple regression where the predictor $x\in\mathbb{R}$.
First write $y=E[y \mid x]+u$, where the variance of $u$ is a constant, and $E[u|x]=0$. I understand $E(y \mid x)$ minimizes the mean square prediction error (MSPE). We call $E[y \mid x]$ the true regression.
I understand the best (simple) linear prediction is $$ a^*+b^* x=\left(E(y)-\frac{\operatorname{cov}(y, x)}{\operatorname{var}(x)} E(x)\right)+\frac{\operatorname{cov}(y, x)}{\operatorname{var}(x)} x $$
Clearly, $E[y - (a^*+b^*x))] = 0 $. Equivalently, if I write $y = a^*+b^*x + \eta$, I can say $E[\eta] = 0$.
My question is whether or not $E[\eta| x] = 0$, which is a stronger statement.
I understand $a^*+b^*x$ is the projection of $y$ onto the linear function space of $x$. I think due to $E[u|x]=0$, I can also say $a^*+b^*x$ is the projection of $E[y|x]$ onto the linear function space of $x$.
Eventually I'm curious about: denote the underlying model as $y=E[y \mid x]+u$, where $E[u|x]=0$. If we are dealing with linear working model, and find the best linear predictor $a^*+b^*x$, Is it true that $y = a^*+b^*x + \eta$, where $E[\eta|x]=0$? (sure I understand it is not minimizing the MSPE).
Update:
When the true regression $E[y|x]$ is not linear, the model $y = a^*+b^*x + \eta$ does not have the property $E[\eta|x]=0$. When one is optimizing over the model $y = a+bx + \eta$ with $E[\eta|x]=0$, they are implicitly assuming that the true regression $E[y|x]$ is a linear model. This argument can be seen from the proof of the conditional expectation is the unique minimizer of MSPE.