Assumptions
Let's start with the following assumptions. Assume the matrices $\Sigma$, $L$, and $S$ defined below are known in advance.
- $\Sigma$ is a symmetric positive definite $n \times n$ matrix.
- $\Sigma$ has Cholesky factorization $L L^T$ where $L$ is lower triangular.
- Matrix $S$ is a subset of unique rows of the $n \times n$ identity matrix.
- $S$ has $m$ rows and $n$ columns, where $m < n$.
- Each row of $S$ has a single 1 and all the other elements equal to 0.
- The rows of $S$ are orthonormal.
- By construction, the product $S \Sigma$ is an $m \times n$ matrix whose rows are a unique subset of the rows of $\Sigma$.
Question
Is there a fast shortcut to get the Cholesky factor of $S \Sigma S^T$ using $L$ and $S$? Looking for something much quicker and easier to compute than the standard Cholesky decomposition algorithms which uses the existing Cholesky decomposition of $\Sigma$.
Context
I am asking because I want to compute the marginal density of a multivariate normal random variable using https://statproofbook.github.io/P/mvn-marg.html. I already have the Cholesky factor of the covariance of the full multivariate normal distribution, and my Stan code will run much faster if I do not need to compute an additional Cholesky factor for the marginal distribution.
EDIT 2023-04-17
The answer below is wrong because $R \Sigma R$ is not lower-triangular in the general case. I leave it here for posterity. Down-votes may help ensure people know not to trust it.
Original post (which is incorrect)
Thanks for the nudges to explain more about my issue. The overarching problem is a statistical issue regarding multivariate normal distributions. Suppose random variable $y$ is a vector of length $n$ with a multivariate normal distribution with mean vector $\mu$ and positive-definite symmetric covariance matrix $\Sigma$. We denote this by:
$$ y \sim \text{MVN}(\mu, \Sigma) $$
My goal is to efficiently work with the marginal distribution of a subset of the components of $y$:
$$ S R y \sim \text{MVN} ( SR\mu, SR\Sigma R^TS^T) $$
In this problem, we assume we already know the Cholesky factorization $\Sigma = L L^T$. In order to quickly evaluate the density of the marginal distribution above, we want a shortcut to get the Cholesky factor of $SR\Sigma R^TS^T$ without having to run the standard factorization algorithm from scratch. This is because
multi_normal_cholesky_lpdf()is much faster thanmulti_normal_lpdf()in Stan. Fortunately, a shortcut exists, as I prove below.Here, $R$ is an $n \times n$ permutation matrix. In my case, $Ry$ is a column vector that puts the desired elements of the subset of $y$ at the front of the vector, while preserving the relative order of the desired components. To explain more about what I mean, suppose $y = [a \ \ b \ \ c \ \ d]^T$ and I want the marginal distribution of $[b \ \ c]^T$. Then, $Ry = [b \ \ c \ \ a \ \ d]^T$, with $b$ and $c$ at the front and $b$ before $c$.
Also, $S$ is an $m \times n$ matrix, where $m < n$ is the cardinality of the desired subvector $S R y$. Here, $S = [I_m \ \ 0_{m \times (n - m)}]$, where $I_m$ is the $m \times m$ identity matrix and $0_{m \times (n - m)}$ is the appropriately sized matrix of zeroes.
Now, let $L$ be the lower-triangular Cholesky factor of $\Sigma$ (with $\Sigma = L L^T$).
We know $R L R^T$ is lower triangular because $R$ permutes the rows of $L$ in the exact same way as $R^T$ permutes the columns of $L$.(The previous sentence is definitely wrong, and it is straightforward to come up with small counterexamples.) In addition, $R^T = R^{-1}$ because $R$ is a permutation matrix and thus orthonormal. Thus:$$ R L R^T(R L R^T)^T = R L R^T R L^T R^T = R L L^T R^T = R \Sigma R^T $$
That means $R L R^T$ is the Cholesky factor of $R \Sigma R^T$. Finally, the matrix $S$ is constructed such that $S R \Sigma R^T S^T$ is the upper-left $m \times m$ block of $R \Sigma R^T$ and $S R L R^T S^T$ is the upper-left $m \times m$ block of $R \Sigma R^T$. By the lemma below, this makes $S R L R^T S^T$ the lower-triangular Cholesky factor of $S R \Sigma R^T S^T$. Best of all, $S R L R^T S^T$ is super fast to compute because it is just a subset of the rows and columns of $L$.
Lemma
Define $\Sigma$ to be an $n \times n$ positive definite symmetric matrix with lower-triangular covariance matrix $L$. Write these matrices in block notation:
$$ \Sigma = \begin{bmatrix} A & B \\ C & D \end{bmatrix} \qquad L = \begin{bmatrix} X & O \\ Y & Z \end{bmatrix} $$
where $A$ and $X$ are $m \times m$ blocks $(m < n)$ and $O$ is the appropriately sized matrix of zeroes. Then, $X$ is the $m \times m$ lower-triangular Cholesky factor of $m \times m$ matrix $A$.
Proof of the Lemma
By construction, matrix $X$ is already lower-triangular. From matrix multiplication and the definition of the Cholesky factorization, we find:
$$ \begin{aligned} \Sigma &= L L^T \\ \begin{bmatrix} A & B \\ C & D \end{bmatrix} &= \begin{bmatrix} X & O \\ Y & Z \end{bmatrix} \begin{bmatrix} X^T & Y^T \\ O^T & Z^T \end{bmatrix} = \begin{bmatrix} XX^T + OO^T & XY^T + OZ^T \\ YX^T + ZO^T & YY^T + ZZ^T \end{bmatrix} = \begin{bmatrix} XX^T & XY^T + OZ^T \\ YX^T + ZO^T & YY^T + ZZ^T \end{bmatrix} \end{aligned} $$
Thus, $A = XX^T$.