I've read a paper "structured uncertainty prediction networks", and I don't understand how to sample from a multivariate Gaussain in the paper.
Here is a sampling method used in the paper.
Suppose that $x \sim \mathcal{N}(\mu, \Lambda^{-1}) \quad where \quad L L^T = \Lambda \quad and \quad \Lambda^{-1} = \Sigma$.
Let $\mu,L$ be given and $y = L^T(x - \mu)$.
Then sampling proceeds as follows
1. Sampling $\epsilon$ from $\Sigma$
2. add $\epsilon$ to $\mu$, then we get x.
In the first sampling phase, the paper said that
"Sampling from $\Sigma$ involves solving the triangular system of equations $L^Ty=u$ with backwards substitution,"
I don't understand why solving $L^Ty=u$ is related with sampling from $\Sigma$. Would you please elaborate this?
To sample $\epsilon \sim \mathcal{N}(0, \Sigma)$ where $\Sigma = MM^\top$ you can generate $u \sim \mathcal{N}(0, I)$ and let $\epsilon = Mu$. (This is how the paper defines $u$, but you forgot to mention this in your post.)
However, if you instead have a decomposition of the precision matrix $\Sigma^{-1} = LL^\top$ and you have $L$ but not $M$, then you can generate $\epsilon$ by $\epsilon = (L^\top)^{-1} u$, which can be found by solving the system $L^\top \epsilon = u$ for $\epsilon$. (I'm not sure why they wrote $y$ there; I think here it is a dummy variable, and not the same $y=L^\top (x-\mu)$ that was defined earlier.)