Let $C$ be an $n\times n$ matrix, $X$ is $n \times k$, $\epsilon$ is $n \times 1$
This is taken from a simply proof of strict exogeneity in an Econometrics textbook by Hayashi. The explanation he gives is that "Since C is nonsingular, $X$ and $C\cdot X = \tilde{X}$ contain the same information".
The explanation does not really help me: frankly, I don't even know how to attempt thinking about or proving this.
I am not looking for a proof, necessarily, just some good reasoning will do (a proof is okay, though).
Intuitively, I can sort-of understand that multiplying $X$ by $C$ wouldn't change the "information" contained in $X$ (since we are just scaling what is known. I guess perhaps I should be thinking in terms of sigma fields or something?). But what does non-singularity have to do with this?
So my two questions are: I'm seeking a better explanation for why $$E(C\cdot \epsilon \;\vert\; C\cdot X) = E(C\cdot \epsilon\; \vert\; X)$$ And what does non singularity have to do with this?
Thanks.
You are interested in expectation "$C\epsilon$ given $CX$". If $C$ is nonsingular, given $CX$, I can calculate $X=C^{-1}CX$; hence "given $CX$" and "given $X$" are the same thing. In other words, $X$ is deterministic as soon as $CX$ is known.
On the other hand, when $C$ is singular you are losing the following information: $\Pi_{null(C)}(X)$ = Projection of $X$ onto null space of $C$.
So, while "projection of $X$ onto range space of $C^T$" can be deduced from $CX$, the random variable $\Pi_{null(C)}(X)$ maps to $0$ and that information is lost. In the extreme case of $C=0$, you lose everything.