Generating multivariate normal samples - why Cholesky?

15.7k Views Asked by At

Hello everyone and happy new year! May all your hopes and aspirations come true and the forces of evil be confused and disoriented on the way to your house.

With that out of the way...

I am trying to write a computer code that gets a vector $\mu \in R^n $ and matrix $\Sigma \in \mathbb R^{n \times n}$ and generates random samples from the multivariate normal distribution with mean $\mu$ and covariance $\Sigma$.

The problem: I am only allowed to use the program to sample from the single variable normal distribution with mean $0$ and variance $1$: $N(0, 1)$.

The proposed solution: Define a vector of zeros (initially) $v \in \mathbb R^n$, now for all $i$ from $1$ to $n$, draw from a single variable normal dist: $v_i \overset{}{\sim} N(0, 1)$.

Now do a Cholesky decomposition on $\Sigma$: $\Sigma = LL^T$.

Now finally the random vector we want that is distributed from the multivariate gaussian is $Lv + \mu$.

My question is why? I don't understand the intuition, if it was a single dimensional distribution $N(\mu, \sigma^2)$ then I understand why $\sigma ^2 v + \mu$ is a good idea, so why cholesky? Wouldn't we want $\Sigma v + \mu$?

2

There are 2 best solutions below

5
On BEST ANSWER

After the comment of Rahul you understood that in any parametrization $x=Av+μ$ you will need that $$ Σ=\Bbb E(x-μ)(x-μ)^T=A·\Bbb E(vv^T)·A^T=AA^T. $$ There are infinitely many possibilities to chose $A$, with any orthogonal matrix $Q$ also $\tilde A=AQ$ satisfies that condition.

One could even chose the square root of $Σ$ (which exists and is unique among the s.p.d. matrices).

The advantage of using the Cholesky factorization is that you have a cheap and easy algorithm to compute it.

1
On

I know I am a bit late to the party, but maybe this still helps somebody. I thought of this question in the following way:

From the transformation properties of the Gaussian distribution it is known that if $X \sim \mathcal N (\tau, \Lambda)$, with $\tau\in \mathbb R^n$ and $\Lambda\in \mathbb R^{n\times n}$ then the affine transformation $Y=BX + b$ is distributed as $Y\sim \mathcal N (B\tau+ b, B\Lambda B^T)$.

Of course $B$ and $b$ must be of suitable dimensions. Now sample $X$ from a standard $n$-dimensional normal distribution to get $X\sim \mathcal{N} (0,\boldsymbol 1)$, where $\boldsymbol 1$ is the $n\times n$ identity matrix, then $Y = BX+b$ is distributed as $Y\sim \mathcal(b, BB^T)$, which is just plugging in $0$ and $\boldsymbol 1$ for $\tau$ and $\Lambda$ in the relaltion above.

If you now want $Y$ to be distributed as $Y\sim \mathcal N(\mu, \Sigma)$, then you need to choose some transformation matrix $A$, such that $AA^T = \Sigma$. A good choice for that is the Cholesky decomposition because it is explicitly structured to fulfill this condition and there are standard functions in lots of libraries which compute it for you.