Let $(x_1, x_2)^{T}$ be a multivariate normal random vector with mean $(\mu_1,\mu_2)^{T}$ and covariance matrix $$ \Sigma= \begin{pmatrix} \Sigma_{1,1}&\Sigma_{1,2}\\ \Sigma_{2,1}&\Sigma_{2,2} \end{pmatrix}. $$ Then the conditional distribution $p(x_1|x_2)$ is $N(x_1| \mu_{1|2},\Sigma_{1|2})$, where $$ \mu_{1|2}=\mu_{1}+\Sigma_{1,2}\Sigma_{2,2}^{-1}(x_2-\mu_2) $$ and $$ \Sigma_{1|2}=\Sigma_{1,1}-\Sigma_{1,2}\,\Sigma_{2,2}^{-1}\,\Sigma_{2,1}=[(\Sigma^{-1})_{1,1}]^{-1}. $$
Are there conceptual ways of viewing these formulas so as to make them look natural? For example, the mean formula almost makes sense to me, because we first measure the difference between $x_2$ and its mean, and then the term $\Sigma_{1,2}\Sigma_{2,2}^{-1}$ makes me think of converting the covariance scaling of the $x_2$ variable into the appropriate scaling of the $x_1$ variable. Once we have the appropriate deviation we tack that on to $\mu_1$ and that gives us the conditional mean. This makes some sense, because if we think of a diagonal ellipse, the mean of the $x_1$ variable should linearly depend of the mean of the $x_2$ variable.
Is this correct, and can similar reasoning be applied to the covariance formula? Do these formulas fall out when we think of projections in the right way?
Put $X_1, X_2$ into a column vector $\bf X$, and $\mu_1, \mu_2$ into $\bf \mu$. Note that for constant vector $\bf t$, ${\bf t}^T \bf X$ is normal with mean ${\bf t}^T {\bf \mu}$ and variance ${\bf t}^T \Sigma {\bf t}$, and thus $\bf X$ has MGF $$\eqalign{M_{\bf X}({\bf t}) &= \mathbb E [\exp({\bf t}^T {\bf X})] = \exp({\bf t}^T {\bf \mu} + {\bf t}^T \Sigma {\bf t}/2)\cr &= \exp(\mu_1 t_1 + \mu_2 t_2 + \Sigma_{11} t_1^2/2 + \Sigma_{12} t_1 t_2 + \Sigma_{22}t_2^2/2)\cr }$$
On the other hand, by the law of total expectation
$$ \mathbb E[ \exp({\bf t}^T {\bf X})] = \mathbb E [\mathbb E[\exp({\bf t}^T {\bf X}) \mid X_2]] $$ If we assume that the conditional distribution of $X_1$ given $X_2$ is normal with mean $\alpha_0 + \alpha_1 X_2$ and variance $\beta^2$, we would have $$ \eqalign{\mathbb E[\exp({\bf t}^T {\bf X} \mid X_2 ] &= \exp(t_1 \alpha_0 + (t_1 \alpha_1 + t_2) X_2 + t_1^2 \beta^2/2)\cr \mathbb E[\mathbb E[\exp({\bf t}^T {\bf X} \mid X_2 ]] &= \exp(t_1 \alpha_0 + t_1^2 \beta^2/2 + (t_1 \alpha_1 + t_2) \mu_2 + (t_1 \alpha_1 + t_2)^2 \Sigma_{22}/2)}$$
Comparing this to the MGF, we see that it works if $$\frac12\left( \Sigma_{{1,1}}-\beta^2-{\alpha_{{1}}}^{2}\Sigma_{{2, 2}} \right) {t_{{1}}}^{2}+ \left( -\Sigma_{{2,2}}\alpha_{{1}}+\Sigma_{ {1,2}} \right) t_{{1}}t_{{2}}+ \left( -\alpha_{{1}}\mu_{{2}}-\alpha_{{0 }}+\mu_{{1}} \right) t_{{1}} = 0 $$ This must hold for all $t_1, t_2$, so the coefficients of $t_1^2$, $t_1 t_2$ and $t_1$ must all be $0$, and solving those equations yields $$ \eqalign{\alpha_0 &= -{\frac {\Sigma_{{1,2}}\mu_{{2}}}{\Sigma_{{2,2}}}}+\mu_ {{1}}\cr \alpha_{{1}}&={\frac {\Sigma_{{1,2}}}{\Sigma_{{2,2}}}}\cr \beta^2 &= \Sigma_{{1,1}}-{\frac {{\Sigma_{{1,2}}}^{2}}{\Sigma_{{2,2}}}}} $$