Suppose that $X \sim N_2 \left(\begin{pmatrix}1 \\2 \end{pmatrix}, \begin{pmatrix} 2 & 1\\1&2 \end{pmatrix}\right)$ and $Y\mid X \sim \left(\begin{pmatrix}X_1 \\X_2 \end{pmatrix}, \begin{pmatrix} 1 & 0\\0&1 \end{pmatrix}\right)$.
Show that $\begin{pmatrix} X\\Y \end{pmatrix}\sim N_4 \left(\begin{pmatrix} 1\\2\\1\\3 \end{pmatrix}, \begin{pmatrix} 2&1&2&3\\1&2&1&3\\2&1&3&3\\3 &3&3&7 \end{pmatrix}\right)$.
My thoughts:
I knew some rule that stated if you have $\begin{pmatrix} X\\Y \end{pmatrix}$, than you can calculate $X\mid Y$ by partitioning the mean vector and covariance matrix of $\begin{pmatrix} X\\Y \end{pmatrix}$. However, this example was about $Y\mid X$, so first I transformed the $\begin{pmatrix} X\\Y \end{pmatrix}$ into $\begin{pmatrix} Y\\X \end{pmatrix}$. Now doing this, I can show that the results of $Y\mid X$ are indeed correct, assuming the multivariate distribution that I actually need to show. In doing so, I use the following:
$\text{Mean} = \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(\mathbf{x_2}-\boldsymbol{\mu}_2)$ and a similar formula for the covariance matrix.
My question
Is this rule so strong that it is a biconditional logical connective (and thus my reasoning is sufficient)? Or is there a different way to show how $\begin{pmatrix} X\\Y \end{pmatrix}$ behaves?
You have $Y\mid X \sim N(X, I).$
Therefore $(Y-X)\mid X \sim N(0,I).$ That is because when you condition on $X$, you're treating $X$ as constant rather than random, and when you subtract a constant from a random variable, you do not change the variance, and the way in which you change the expected value is that you subtract the same constant from it.
The fact that the conditional distribution of $Y-X$ given $X$ does not depend on $X$ enables you to draw two conclusions:
Thus $Y$ is the sum of two independent random variables, $X$ and $Y-X,$ and you know their distributions, and they're both normal.
Now ask for what matrix $M$ do we have $M\begin{bmatrix} X \\ Y-X \end{bmatrix} = \begin{bmatrix} X \\ Y \end{bmatrix}.$
Recall that if $M$ is a constant matrix then $\operatorname E(MW) = M\operatorname E(W)$ and $\operatorname{var}(MW) = M\Big(\operatorname{var}(W)\Big) M^\top.$