Conditional expectation from joint distribution

290 Views Asked by At

I am new to probability and trying to convince myself of the correctness of the equations in this paper on factor analysis. There is a step I am missing. I'll give my understanding so far and then highlight the question below.

Given a $p$-dimensional vector $\textbf{x}$ modeled using a $k$-dimensional factor $\textbf{z}$ where typically $k < p$, the model for factor analysis is:

$$ \textbf{x} = \Lambda \textbf{z} + \textbf{u} $$

Where $\Lambda$ is a matrix, $\textbf{u} \sim \mathcal{N}(0, \Psi)$, and $\textbf{z} \sim \mathcal{N}(0, I)$. This means $\textbf{x} \sim \mathcal{N}(0, \Lambda \Lambda^{\top} + \Psi)$ because:

$$ \begin{align} \textbf{x} &= \Lambda \textbf{z} + \textbf{u} \\ &= \Lambda \mathcal{N}(0, I_k) + \mathcal{N}(0, \Psi) \\ &= \mathcal{N}(0, \Lambda \Lambda^{\top} + \Psi) \end{align} $$

Now, we have the joint distribution:

$$ P\bigg( \begin{bmatrix} \textbf{x} \\ \textbf{z} \end{bmatrix} \bigg) = \mathcal{N}\bigg( \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \begin{bmatrix} \Lambda \Lambda^{\top} + \Psi & \Lambda \\ \Lambda^{\top} & I \end{bmatrix} \bigg) $$

I can convince myself that this is correct and fairly easily. $\text{Var}(\textbf{x})$ and $\text{Var}(\textbf{z})$ come from their definitions, while $\text{Cov}(\textbf{x}, \textbf{z})$ and $\text{Cov}(\textbf{z}, \textbf{x})$ are easy enough to compute, e.g.:

$$ \begin{align} \text{Cov}(\textbf{x}, \textbf{z}) &= \mathbb{E}[(\textbf{x} - \mathbb{E}[\textbf{x}])(\textbf{z} - \mathbb{E}[\textbf{z}])^{\top}] \\ &= \mathbb{E}[(\textbf{x} - 0)(\textbf{z} - 0)^{\top}] \\ &= \mathbb{E}[(\Lambda \textbf{z} + \textbf{u})(\textbf{z})^{\top}] \\ &= \mathbb{E}[\Lambda \textbf{z} \textbf{z}^{\top} + \textbf{u} \textbf{z}^{\top}] \\ &= \Lambda \mathbb{E}[\textbf{z} \textbf{z}^{\top}] + \mathbb{E}[\textbf{u} \textbf{z}^{\top}] \\ &= \Lambda \end{align} $$

Where $\mathbb{E}[\textbf{u} \textbf{z}^{\top}] = \mathbb{E}[\textbf{u}]\mathbb{E}[\textbf{z}^{\top}] = 0 \cdot 0$ and $\mathbb{E}[\textbf{z}\textbf{z}^{\top}] = I_k$ because:

$$ \begin{align} \text{Var}(\textbf{z}) &= \mathbb{E}[\textbf{z}\textbf{z}^{\top}] + \mathbb{E}[\textbf{z}] \mathbb{E}[\textbf{z}]^{\top} \\ I_k &= \mathbb{E}[\textbf{z}\textbf{z}^{\top}] + 0 \end{align} $$

So far so good.

Question

The authors then claim that the conditional expectation of the first and second moments of the factors are:

$$ \begin{align} \mathbb{E}[\textbf{z} \mid \textbf{x}] &= \Lambda^{\top} (\Psi + \Lambda \Lambda^{\top})^{-1} \textbf{x} \\ \\ \mathbb{E}[\textbf{z} \textbf{z}^{\top} \mid \textbf{x}] &= I_k - (\Lambda^{\top} \Psi + \Lambda \Lambda^{\top})^{-1} \Lambda + \Lambda^{\top} (\Psi + \Lambda \Lambda^{\top})^{-1} \Lambda^{\top} \textbf{x} \textbf{x}^{\top} ((\Psi + \Lambda \Lambda^{\top})^{-1})^{\top} \end{align} $$

The authors claim that this comes from "the joint normality of data and factors". How was this computed? I've gone through the Wikipedia page on conditional expectation, but I don't see anything that defines it in terms of the joint distribution or conditional distribution.

1

There are 1 best solutions below

4
On BEST ANSWER

Actually this is an important property of Gaussian distribution and it is frequently used. Suppose $$ P\bigg( \begin{bmatrix} {x}_1 \\ {x}_2 \end{bmatrix} \bigg) = \mathcal{N}\bigg( \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix} , \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} \bigg). $$ Then the conditional distribution of $x_1$ given $x_2$ is $$ P(x_1|x_2) = \mathcal{N}(\mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2-\mu_2),\Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}). $$ Therefore we have the conditional expectation as you give, because in this case, $$ \Sigma_{11} = \Lambda\Lambda^{\top} + \Psi, \Sigma_{12} = \Lambda,\Sigma_{22} = I,\mu_1 = \mu_2 = 0. $$ For the second quantity, just use the fact that $$ \mathbb{E}[\textbf{z} \textbf{z}^{\top} \mid \textbf{x}] = \mathbb{E}[z|x](\mathbb{E}[z|x])^{\top} + {\rm Var}(z|x). $$