multivariate normal computation in matlab

316 Views Asked by At

I have a 6 dimensional normal distribution with mean zero and co-variance matrix given as follows:

$S_{ij} = 1 for i==j$

$S_{1j} = sqrt(0.5) for j=2,3,4,5,6$

$S_{i1} = sqrt(0.5) for i=2,3,4,5,6$

$S_{ij} = 0.5 for i\neq j, i \neq 1 or j \neq 1 $

i.e. the diagonals are all 1. the first row and the first cols is sqrt(0.5) and the rest of the entries are 0.5

Now I have the following code.

n=6;
mu = zeros(1,n);
X = ones(1,n);
X=X*0.;
Y = mvncdf(X,mu,S);

Y gives me the cumulative prob for the point (0,0,0,0,0,0). I get Y=0.164.

Now I can also compute the probability by transforming the co-variance matrix

Sinv = inv(S);
cholS = chol(Sinv,'lower');
cholSinv = inv(cholS);
Z = cholSinv*X';
phi = normcdf(Z,0,1);
Yhat=prod(phi);

I get Yhat = 0.0156.

What I am trying to do theoretically is as follows: For the multivariate case I have

$ \int_{-\infty}^{A} exp(-0.5 * Y'\Sigma Y) dY$ (I am ignoring the normalization)

I have decomposed $\Sigma = C C'$ and also transformed $Y = CZ$

This makes the above integral

$ \int_{-\infty}^{A} exp(-0.5 * Y'\Sigma Y) dY = \int_{-\infty}^{C^{-1} A} exp(-0.5 * Z' Z) dZ$

The integral on the left is computed using my first code snippet.

The integral on the right is separable and can be computed using my second code snippet.

I guess I am making some mistake but I am unable to figure out what. It would be very kind if someone can point out the mistake. Thanks in advance

1

There are 1 best solutions below

1
On BEST ANSWER

To see where you're going wrong consider the simpler example of a bivariate normal $(X,Y)$ with covariance matrix $$ \begin{pmatrix}1 & \rho\\\rho & 1\end{pmatrix}$$ for some correlation $-1\le \rho \le 1.$ By Cholesky or otherwise we can represent $X$ and $Y$ as $$ X = Z_1 \\ Y = \rho Z_1+\sqrt{1-\rho^2}Z_2 $$ where $Z_1$ and $Z_2$ are independent standard normals.

Intuitively we can see that $P(X \le 0,Y \le 0)$ (i.e. the joint CDF evaluated at the zero vector) is $1/4$ for $\rho=0,$ $1/2$ for $\rho=1,$ and $0$ for $\rho = -1.$ So we expect this probability to interpolate between $0$ and $1/2$ as $\rho$ goes between $-1$ and $1.$

However, your second method would say to compute this as $$ P(X\le 0,Y\le 0) = P(Z_1 \le 0, Z_2 \le 0) =P(Z_1\le0)P(Z_2\le 0) = \frac{1}{2}\frac{1}{2} = \frac{1}{4} $$ which has to be wrong since it doesn't appear to depend on the correlation.

As I mentioned in the comments, your error is that you are taking the probability of the wrong region of $Z$-space. If $X\le 0$ and $Y\le 0,$ then we have $$ Z_1 \le 0 \\ \rho Z_1 + \sqrt{1-\rho^2}Z_2 \le 0.$$ So we don't want $P(Z_1\le 0, Z_2\le 0).$ What we want is $$ P(X\le 0,Y\le 0) = P\left(Z_1\le 0, Z_2 \le -\frac{\rho}{\sqrt{1-\rho^2}} Z_1\right).$$

To compute this we need to integrate the density of $Z$ over the appropriate region. If we denote $\phi(x)$ the std normal PDF and $\Phi(x)$ the std normal CDF, we get $$ P(X\le 0, Y\le 0) = \int_{-\infty}^0 \phi(z_1)dz_1\int_{-\infty}^{-\frac{\rho z_1}{\sqrt{1-\rho^2}}}\phi(z_2) dz_2 = \int_{-\infty}^0 \phi(z_1)\Phi\left(-\frac{\rho z_1}{\sqrt{1-\rho^2}}\right)dz_1 \\= \int_0^\infty \phi(z) \Phi\left(\frac{\rho z}{\sqrt{1-\rho^2}}\right)dz.$$

This integral can be done exactly, but I can't figure out how and don't want to distract from the point. So let's just see what it gets for a positive correlation $\rho = \frac{1}{\sqrt 2},$ cherry picked to have the integral be easy. Then the quantity reduces to $$ P(X\le 0, Y\le 0) \int_0^\infty \Phi(x)\phi(x)dx = \int_0^\infty \Phi(x)\Phi'(x)dx = \int_{1/2}^1 \Phi d\Phi = \frac{3}{8}.$$ A value between $1/4$ and $1/2$ just like we'd expect.

Side note: Using some combination of Mathematica and trial and error, I get the expression $$ P(X\le 0, Y\le 0) = \frac{1}{2}H(\rho) - \frac{1}{2\pi}\cot^{-1}\left(\frac{\rho}{\sqrt{1-\rho^2}}\right)$$ where $H(x)$ is the step function. But this is somewhat beside the point cause your full $6$-variable question probably won't have a nice analytic answer, but you can do it by numerical integration. (But to that point, why not just stick with the built in multivariate normal cumulative someone was nice enough to implement for you.)