Proof of Theorem 4.2 in Elements of Causal Inference

51 Views Asked by At

I'm a novice in the field of causal inference, and was going through material on Structure Identifiability in the textbook 'Elements of Causal Inference'. I am unable to get an intuition about the following theorem:

Theorem 4.2 (Identifiability of linear non-Gaussian models): Assume that $P_{X, Y}$ admits the linear model $$ Y=\alpha X+N_Y, \quad N_Y \perp\kern-5pt\perp X, $$ with continuous random variables $X, N_Y$, and $Y$. Then there exist $\beta \in \mathbb{R}$ and $a$ random variable $N_X$ such that $$ X=\beta Y+N_X, \quad N_X \perp\kern-5pt\perp Y, $$ if and only if $N_Y$ and $X$ are Gaussian.

Here's the proof of this theorem, as also given in the textbook:

Proof of Theorem $4.2$

We first state a lemma.

Lemma C.1 Let $X$ and $N$ be independent wariables and assume that $N$ is nondeterministic. Then $N \not(X+N)$. Proof of Theorem 4.2. If $X$ and $N_Y$ are normally distributed, we have $$ \beta:=\frac{\operatorname{cov}[X, Y]}{\operatorname{cov}[Y, Y]}=\frac{\alpha \operatorname{var}[X]}{\alpha^2 \operatorname{var}[X]+\operatorname{var}\left[N_Y\right]} $$ and define $N_X:=X-\beta Y . N_X$ and $Y$ are uncorrelated by construction and because $N_X$ and $Y$ are jointly Gaussian, it follows that they are independent, too. To prove the "only if" statement, we assume that $Y=\alpha X+N_Y$ and $N_X=(1-\alpha \beta) X-\beta N_Y$ are independent. Distinguish between the following cases

(i) $(1-\alpha \beta) \neq 0$ and $\beta \neq 0$. Here, Theorem $4.3$ implies that $X, N_Y$ and thus also $Y, N_X$ are normally distributed. Hence, $P_{X, Y}$ is bivariate Gaussian, too. (ii) $\beta=0$. This implies $$ X \perp\kern-5pt\perp \alpha X+N_Y, $$ which is a contradiction to Lemma C.1.

(iii) $(1-\alpha \beta)=0$. It follows $-\beta N_Y \perp \alpha X+N_Y$. Thus $$ N_Y \perp\kern-5pt\perp \alpha X+N_Y, $$ which, again, contradicts Lemma C.1. This concludes the proof.

where, Theorem 4.3 (Darmois-Skitovič):
Let $X_1, \ldots, X_d$ be independent, non-degenerate random variables. If there exist non-vanishing coefficients $a_1, \ldots, a_d$ and $b_1, \ldots, b_d$ (that is, for all $i, a_i \neq 0 \neq b_i$ ) such that the two linear combinations $$ \begin{array}{l} l_1=a_1 X_1+\ldots+a_d X_d, \\ l_2=b_1 X_1+\ldots+b_d X_d \end{array} $$ are independent, then each $X_i$ is normally distributed.


My question: where do the three conditions for independence of $Y and N_x$ come from? Can anyone shed more light on the intuition that guides this proof? I'm getting so lost in the maths part of it, that I keep missing the point. Thanks!