I recently read the wikipedia article about nonparametric regression. It contains the following quote:
In nonparametric regression, we have random variables $X$ and $Y$ and assume the following relationship:
$$\mathbb{E}[Y \,\big|\, X = x] = m(x)$$
where $m(x)$ is some deterministic function. [...] Some authors use a slightly stronger assumption of additive noise:
$$Y = m(X) + U$$
where the random variable $U$ is the noise term, with mean 0.
I am wondering how the additive noise assumption is slightly stronger. For a model
$$Y = m(X) + U$$
I find
$$\mathbb{E}[Y \,\big|\, X = x] = \mathbb{E}[m(X) + U \,\big|\, X] = m(x) + \mathbb{E}[U \,\big|\, X]$$
which is only equal to $m(x)$ if and only if the conditional expectation of $U$ given $X$ is equal to zero (for example if $U$ is independent). How do we get $\mathbb{E}[U \,\big|\, X] = 0$ and why exactly is the first assumption more general?
EDIT: For two random variables $X, Y$, there always exists the decomposition
$$Y = \underbrace{\mathbb{E}[Y \,\big|\, X = x]}_{=: m(X)} + \underbrace{(Y - \mathbb{E}[Y \,\big|\, X = x]}_{=: \epsilon}$$
In this setting we have $\mathbb{E}[\epsilon] = 0$ by the law of iterated expectation. Isn't this already additive noise?
When you assume $E[Y \, | \, X = x] = m(x)$, then after controlling for $X$, all variation in $Y$ needs to be explained by $m(x)$.
Let's say that you instead assume that $E[Y \, | \, X = x] = n(x)$, where $n(x) = m(x) + U$. Then, you're still assuming that the conditional expectation can be described by some function, just like you were before, but you're also now imposing an additional constraint on the structure of that function: it has an additive noise term. That's why it's a stronger assumption.
If you don't include a noise term, then $m(x)$ has to explain all variation in $Y$ conditional on $X$, while including the noise term gives you a bit more flexibility since the noise can explain some of the variation in $Y$ conditional on $X$.
Now, the bit about $E[U \, | \, X] = 0$. It's not a consequence, it's an assumption we make about noise, and it's been around since classical OLS (strict exogeneity assumption). Without this assumption, there's not really a good reason to assume some sort of additive noise.
Also, you note that $E[U \, | \, X] = 0$ if $U$ and $X$ are independent. While that's true, independence is an even stronger assumption than strict exogeneity, and it isn't necessary for a lot of asymptotic theory, so many authors are content with just an assumption that the conditional expectation is 0, rather than assuming independence.