I found a very weird formula for the conditional distribution of bivariate normals in a paper that I am reading. All of the results in the paper rely on it and I think it is incorrect.
Let
$$ y\equiv\begin{bmatrix} y_1 \\ y_2 \end{bmatrix} \sim N\left( \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix} , \Omega_y\right), \qquad \text{and} \qquad x\equiv\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \sim N\left( \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} , \Omega_x\right).$$
The paper writes that it follows that
$$ y | x \sim N\left( \left(\Omega_x^{-1}+\Omega_y^{-1}\right)^{-1}\left(\Omega_y^{-1}\mu+ \Omega_x^{-1}x\right) , \left(\Omega_x^{-1}+\Omega_y^{-1}\right)^{-1}\right), $$
where $\mu\equiv[\mu_1,\mu_2]'$.
This looks completely wrong to me. The covariance matrix is
$$ \begin{bmatrix} \operatorname{Var}(x) & \operatorname{Cov}(x,y) \\ \operatorname{Cov}(y,x) & \operatorname{Var}(y) \end{bmatrix} = \begin{bmatrix} \Omega_x +\Omega_y& \Omega_y \\ \Omega_y & \Omega_y \end{bmatrix}$$
and, so, the conditional distribution should be
$$ y | x \sim N\left(\mu + \Omega_y\left(\Omega_x + \Omega_y\right)^{-1}\left(x-\mu\right) , \Omega_y -\Omega_y\left(\Omega_x + \Omega_y\right)^{-1}\Omega_y \right). $$
I haven't been able to reconcile these two formulas. Can anyone verify that the formula in the paper is indeed incorrect or let me know what I am doing wrong?
Use the following identities, suppose $\mathbf{y}$ has a marginal Gaussian distribution (note that it comes out a little cleaner in terms of the precision $\Lambda = \Sigma^{-1}$), $$ p(\mathbf{y}) = \mathcal{N}\left( \textbf{y} | \mathbf{\mu},\Lambda^{-1}\right) $$ and that conditional on some linear transformation of $\mathbf{y}$ we have $$ p(\mathbf{x}|\mathbf{y}) = \mathcal{N}\left(\mathbf{x} |\mathbf{A}\mathbf{y} + \mathbf{b , \mathbf{L}^{-1}} \right), $$ then the conditional distribution of $\textbf{y}$ given $\textbf{x}$ is $$ p(\mathbf{y}|\mathbf{x}) = \mathcal{N}\left(\mathbf{y} | \mathbf{\Sigma}\left(\mathbf{A}^T L(\mathbf{x} - \mathbf{b})+\mathbf{\Lambda}\mathbf{\mu} \right), \mathbf{\Sigma} \right) $$ where $$ \mathbf{\Sigma} = \left(\mathbf{\Lambda} + \mathbf{A}^T \mathbf{L} \mathbf{A} \right)^{-1}. $$ Applying this with $\mathbf{A} = \mathbf{I}$, $\mathbf{b} = \mathbf{0}$, $\mathbf{\Lambda}^{-1} = \Omega_y$, and $\mathbf{L}^{-1} = \Omega_x$ then you get the result stated.