Many books while introducing the regression problem, start with the assertion that any random variable $Y$ can be decomposed into two orthogonal terms $$ Y= E[Y|X]+\epsilon. $$ In the classical statistics $E[Y|X]$ is a shorthand for $E[Y|X=x]$ where $X$ is some "controlled" (non-random) variable. However in econometric research $X$ is a random variable, thus I guess that $E[Y|X]$ is a shorthand for $E[Y|\sigma(X)]$, where $\sigma(X)$ is a sigma algebra generated by $X$.
- Is it right interpretation?
Another assertion is that $E[Y|X]$ is an orthogonal projection.
- What space does $Y$ projected onto (on $\sigma(X)$?)?
I pretty well understand it from the algebraic point of view when $$ y = \hat{y} + e, $$ and $HY=X(X'X)^{-1}X'y$. In this case the orthogonality of $e$ w.r.t $\hat{y}$ has clear geometric interpretation ($H$ is an orthogonal projection of $y$ onto $C(X)$ and $e \in C(X)^{\perp}$). However, this is a post-hoc approach when we already observed the data points $\{y_i, x_{1i},...,x_{pi}\}_{i=1}^n$, while I'm interested in the stochastic process that generates it.
To sum up, my questions are:
If $X$ is random variable and defined on the same probability space as $Y$, why does an orthogonal decomposition of the kind $$ Y = E[Y|\sigma(X)]+\epsilon=h(X)+\epsilon $$ exists? How can I prove its existence (and uniqueness)? (I know it requires squared integrability of $Y$, but I have non-intuitive explanation how it is suffice for the decomposition to exist).
Are the projections $E[Y|\sigma(X)]$ or $E[Y|X=x]$ project on $\sigma(X)$? If so, does it have any intuitive meaning (like in the linear Algebra analog)
If $\epsilon$ defined on the same probability space, what it means to be orthogonal to $E[Y|\sigma(X)]$?
Would appreciate any help.
Okay, I'll try my best to answer your three questions.
1 - In fact, there is a whole different way to define $E[Y|X]$, but you can prove that this other definition is just equivalent to $E[Y|\sigma(X)]$. So I guess a simple answer to your first question is yes, in fact: $$E[Y|X]=E[Y|\sigma(X)],$$
2 and 3 - As you said, $Y$ must be in $L^2$, because that is a space with inner product. In fact, if $f$ and $g$ are functions in $L^2(\Omega,\mathcal{F},P)$, we can define an inner product in this space by: $$<f,g>:=E[fg]$$
So that is the inner product where you must prove the orthogonality of $\epsilon$ and $E[Y|X]$. It's not hard to see that you must prove the following statements:
(i)-If $Y\in L^2$ and $X$ is any random variable, then $E[Y|X]\in L^2,$
(ii)-$E[\epsilon E[Y|X]=0$.
Note that $\epsilon=Y-E[Y|X]$, so (i) proves that $\epsilon\in L^2$ as well. Note also that $E[Y|X]$ is a $\sigma(X)$-measurable function, so this projection is just a projection of the function $Y$ into the space of $\sigma(X)$-measurable functions. You can think of $E[Y|X]$ as the $\sigma(X)$-measurable function closest to $Y$ in the norm defined by the inner product above.
I don't know if you will need a lot of mathematical tools to prove (i) and (ii). But I hope my answer can clarify your doubts. If you have any problems with the calculations or proofs, let me know