Solving the linear system $XL + L^TX = M$ efficiently

85 Views Asked by At

I'm wondering of an efficient way to solve the following system for the symmetric matrix $X$, given a positive semi-definite matrix $S$ and any matrix $M$: $$ LL^T = S $$ $$ XL + L^TX = M $$ $$ (XL) + (XL)^T = M$$

The problem:

You have parameters $a_i \in \mathbb{R}^n$ $i=1 \dots N$ and $\mu$. You have a random variable $x \sim \mathcal{N}(\mu,\Sigma)$, where $\Sigma = \sum a_i a_i^T$. You are also given a function to optimise $f(x)$. Thus given a final set of samples you want to calculate $\nabla \mu$ and $\nabla a_i$. You also know $\nabla_x f$. Thus you can do the following: $$ LL^T = \Sigma $$ $$ x = \mu+ L \epsilon$$ $$ \nabla \mu = \nabla_x f $$ $$ \nabla L = \nabla_x f \epsilon^T $$ $$ \nabla a_i = 2 \nabla \Sigma a_i$$

Now unfortunately, there I can't still calculate $\nabla \Sigma$. So I use the following relation: $$ \nabla L = \nabla \Sigma L + L^T \nabla \Sigma $$ Thus note the initial equation. However, obviously $\nabla L$ is not symmetric, which means that $\nabla \Sigma$ should also not be symmetric or ...? Where is the mistake in this?

1

There are 1 best solutions below

0
On

Disregarding the problem in itself, just answering the question as posed above "The problem":

You can vectorize and use the Kronecker product, using some of the formula found here. Also after vectorization the transposition operation can be represented by a permutation matrix as explained here.

You will end up with linear equation system $n^2\times n^2$ sparse matrix with a maximum of $2n$ non-zero values each row. If efficiency is important you can strip it down using the symmetric property of $X$ and instead of a vectorization using a half-vectorization although at the cost that maybe the formula can become more difficult to express with the extra Kronecker product abstraction layer.

How feasible this will be completely depends on the size of the matrices. As mentioned in the comments solving with a naiive method having $O(N^3)$ complexity, $N$ matrix side could give $O(n^6)$ where $n$ the side of $X$ matrix. But since the equation system has a very special structure and also is sparse, that would be a very pessimistic estimate. The fact that only square root of the side is non-zero elements for each row could for example invite for usage of Krylov subspace methods.