Fast shortcut to get the Cholesky factor of a submatrix

252 Views Asked by At

Assumptions

Let's start with the following assumptions. Assume the matrices $\Sigma$, $L$, and $S$ defined below are known in advance.

  • $\Sigma$ is a symmetric positive definite $n \times n$ matrix.
  • $\Sigma$ has Cholesky factorization $L L^T$ where $L$ is lower triangular.
  • Matrix $S$ is a subset of unique rows of the $n \times n$ identity matrix.
    • $S$ has $m$ rows and $n$ columns, where $m < n$.
    • Each row of $S$ has a single 1 and all the other elements equal to 0.
    • The rows of $S$ are orthonormal.
  • By construction, the product $S \Sigma$ is an $m \times n$ matrix whose rows are a unique subset of the rows of $\Sigma$.

Question

Is there a fast shortcut to get the Cholesky factor of $S \Sigma S^T$ using $L$ and $S$? Looking for something much quicker and easier to compute than the standard Cholesky decomposition algorithms which uses the existing Cholesky decomposition of $\Sigma$.

Context

I am asking because I want to compute the marginal density of a multivariate normal random variable using https://statproofbook.github.io/P/mvn-marg.html. I already have the Cholesky factor of the covariance of the full multivariate normal distribution, and my Stan code will run much faster if I do not need to compute an additional Cholesky factor for the marginal distribution.

1

There are 1 best solutions below

4
On

EDIT 2023-04-17

The answer below is wrong because $R \Sigma R$ is not lower-triangular in the general case. I leave it here for posterity. Down-votes may help ensure people know not to trust it.

Original post (which is incorrect)

Thanks for the nudges to explain more about my issue. The overarching problem is a statistical issue regarding multivariate normal distributions. Suppose random variable $y$ is a vector of length $n$ with a multivariate normal distribution with mean vector $\mu$ and positive-definite symmetric covariance matrix $\Sigma$. We denote this by:

$$ y \sim \text{MVN}(\mu, \Sigma) $$

My goal is to efficiently work with the marginal distribution of a subset of the components of $y$:

$$ S R y \sim \text{MVN} ( SR\mu, SR\Sigma R^TS^T) $$

In this problem, we assume we already know the Cholesky factorization $\Sigma = L L^T$. In order to quickly evaluate the density of the marginal distribution above, we want a shortcut to get the Cholesky factor of $SR\Sigma R^TS^T$ without having to run the standard factorization algorithm from scratch. This is because multi_normal_cholesky_lpdf() is much faster than multi_normal_lpdf() in Stan. Fortunately, a shortcut exists, as I prove below.

Here, $R$ is an $n \times n$ permutation matrix. In my case, $Ry$ is a column vector that puts the desired elements of the subset of $y$ at the front of the vector, while preserving the relative order of the desired components. To explain more about what I mean, suppose $y = [a \ \ b \ \ c \ \ d]^T$ and I want the marginal distribution of $[b \ \ c]^T$. Then, $Ry = [b \ \ c \ \ a \ \ d]^T$, with $b$ and $c$ at the front and $b$ before $c$.

Also, $S$ is an $m \times n$ matrix, where $m < n$ is the cardinality of the desired subvector $S R y$. Here, $S = [I_m \ \ 0_{m \times (n - m)}]$, where $I_m$ is the $m \times m$ identity matrix and $0_{m \times (n - m)}$ is the appropriately sized matrix of zeroes.

Now, let $L$ be the lower-triangular Cholesky factor of $\Sigma$ (with $\Sigma = L L^T$). We know $R L R^T$ is lower triangular because $R$ permutes the rows of $L$ in the exact same way as $R^T$ permutes the columns of $L$. (The previous sentence is definitely wrong, and it is straightforward to come up with small counterexamples.) In addition, $R^T = R^{-1}$ because $R$ is a permutation matrix and thus orthonormal. Thus:

$$ R L R^T(R L R^T)^T = R L R^T R L^T R^T = R L L^T R^T = R \Sigma R^T $$

That means $R L R^T$ is the Cholesky factor of $R \Sigma R^T$. Finally, the matrix $S$ is constructed such that $S R \Sigma R^T S^T$ is the upper-left $m \times m$ block of $R \Sigma R^T$ and $S R L R^T S^T$ is the upper-left $m \times m$ block of $R \Sigma R^T$. By the lemma below, this makes $S R L R^T S^T$ the lower-triangular Cholesky factor of $S R \Sigma R^T S^T$. Best of all, $S R L R^T S^T$ is super fast to compute because it is just a subset of the rows and columns of $L$.

Lemma

Define $\Sigma$ to be an $n \times n$ positive definite symmetric matrix with lower-triangular covariance matrix $L$. Write these matrices in block notation:

$$ \Sigma = \begin{bmatrix} A & B \\ C & D \end{bmatrix} \qquad L = \begin{bmatrix} X & O \\ Y & Z \end{bmatrix} $$

where $A$ and $X$ are $m \times m$ blocks $(m < n)$ and $O$ is the appropriately sized matrix of zeroes. Then, $X$ is the $m \times m$ lower-triangular Cholesky factor of $m \times m$ matrix $A$.

Proof of the Lemma

By construction, matrix $X$ is already lower-triangular. From matrix multiplication and the definition of the Cholesky factorization, we find:

$$ \begin{aligned} \Sigma &= L L^T \\ \begin{bmatrix} A & B \\ C & D \end{bmatrix} &= \begin{bmatrix} X & O \\ Y & Z \end{bmatrix} \begin{bmatrix} X^T & Y^T \\ O^T & Z^T \end{bmatrix} = \begin{bmatrix} XX^T + OO^T & XY^T + OZ^T \\ YX^T + ZO^T & YY^T + ZZ^T \end{bmatrix} = \begin{bmatrix} XX^T & XY^T + OZ^T \\ YX^T + ZO^T & YY^T + ZZ^T \end{bmatrix} \end{aligned} $$

Thus, $A = XX^T$.