A question of conditional expectation.

99 Views Asked by At

Suppose $Y$ and $X$ are one-dimensional random variable, $\varepsilon$ is the error term such that $Y = X + \varepsilon$ with $\mathrm{E}(\varepsilon \, | \, X) = 0$, then it is obvious that $$\mathrm{E}(Y \, | \, X) = \mathrm{E}(X + \varepsilon \, | \, X) = X.$$ But what I want to know is $\mathrm{E}(X \, | \, Y) = ?$. I am stuck here as $\mathrm{E} (\varepsilon \, | \, Y) \neq 0$, hence we cannot guarantee that $\mathrm{E}(X \, | \, Y) \neq Y$.

Could anyone help me? Thanks in advance!

PS: Here is one of reasons that why I want to study this question: consider a regression such that $$Y_i = \theta X_i + \varepsilon_i, \qquad i = 1, \ldots, n$$ with $Y_i \in \mathbb{R}^p$, $\theta \in \mathbb{R}^p$, and $X_i$ is a scalar, then the matrix representation is $$Y_{n \times p} = X_{n \times 1} \theta_{1 \times p}^{\top} + \varepsilon_{n \times p}$$ where $\mathrm{E} (\varepsilon_i \, | \, X_i ) = 0$. What I want to know is that $$\big[ \mathrm{E} (X \, | \, Y) \big]^{\top} = \frac{\theta^{\top} }{\theta^{\top} \theta} \mathrm{E} (\theta X^{\top} \, | \, Y) = \frac{\theta^{\top} }{\theta^{\top} \theta} \big[ Y - \mathrm{E} [\varepsilon \, | \, Y]\big]$$ What remains to deal is $\mathrm{E} [\varepsilon \, | \, Y]$.

2

There are 2 best solutions below

1
On BEST ANSWER

Too long for a comment.

By replacing $Y$ with $Y - \mu_X,$ we can assume that $Y = X + \varepsilon$ with all random variables having zero expectation. We can write $Y = E(Y \mid X) + (Y - E(Y \mid X)).$ It is then the case and well-known that $Y \mapsto E(Y \mid X)$ is the orthogonal projection from $\mathscr{L}^2(\Omega, \mathscr{F}, \mathbf{P})$ onto its closed subspace $\mathscr{L}^2(\Omega, X^{-1}(\mathscr{B}_\mathbf{R}), \mathbf{P})$ and we also know that this orthogonal projection is unique. You also have $Y = X + \varepsilon$ with $E(\varepsilon \mid X) = 0.$ Therefore, $X = E(Y \mid X)$ and $\varepsilon = Y - E(Y \mid X).$ Since the decomposition $$ Y = E(Y \mid X) + (Y - E(Y \mid X)) $$ is always true regardless of the distributional assumptions of $(X, Y),$ there is nothing you can say about $E(X \mid Y)$ in full generality. Of course, you are assuming that $E(Y \mid X) = X,$ and you already know that bivariate normal $(X, Y)$ satisfies this. Other models that satisfy this are of the form $Y = X + \varepsilon$ with $X \perp \varepsilon$ (in fact, the bivariate normal model is of this type).

1
On

If $Y = X + \epsilon$, then $X = Y - \epsilon$, hence $$\operatorname{E}[X \mid Y] = \operatorname{E}[Y - \epsilon \mid Y] = Y - \operatorname{E}[\epsilon \mid Y].$$ But beyond this, there is not much else to be said.