Prove conditional expectation as minimization of squared error

1k Views Asked by At

I'm a bit confused about the universality of this statement:

Suppose we have real-valued random variables $Y,X$, and differentiable function $f(X)$ (perhaps some model). Do not assume that $f(X)$ is convex.

$$ \mathbb{E}[Y \mid X] = \text{argmin}_f \mathbb{E}[(Y - f(X))^2] $$

Is this always true? And if so, why? Most of these proofs rely on reducing the statement above to (e.g., here):

$$ \text{argmin}_f \mathbb{E}[(\mathbb{E}[Y] - f(X))^2]$$

Then they take the derivative to compute the minimum to show the result, but this would require $f(x)$ to be convex, so would the above statement always hold?

1

There are 1 best solutions below

1
On BEST ANSWER

There's no need to take derivatives or invoke convexity. Having established that $$ E\left[ (Y-f(X))^2\right]=E\left[(Y-E(Y\mid X))\right]^2+E\left[(E(Y\mid X)-f(X))^2\right]=a+b,\tag{$\ast$} $$ we observe the LHS of $(\ast)$ is minimized when $b$ is minimized. Since $b$ is the expectation of a non-negative random variable, it is clear that $b\ge0$. But taking $\hat f(X):=E(Y\mid X)$ leads to $b=0$, hence $\hat f(X)$ is a choice for $f(X)$ that minimizes the LHS.

Now if also $h(X)$ minimizes the LHS of $(\ast)$, then we must have $$E\left[(E(Y\mid X)-h(X))^2\right]=0$$ as well. But $(E(Y\mid X)-h(X))^2$ is a non-negative random variable. Therefore it must equal zero almost surely, which implies $h(X)=E(Y\mid X)$ almost surely.