Random sampling from a conditional bivariate normal distribution

864 Views Asked by At

How does one draw a random sample $\begin{bmatrix} X_i \\ Y_i\end{bmatrix}$, $i=1,\ldots,n$ from the conditional distribution of a bivariate normal distribution, given specified values of the the sample means, the sample variances, and the sample covariance?

If one draws a sample from the bivariate normal distribution $N_2\left( \begin{bmatrix} \mu \\ \nu \end{bmatrix}, \begin{bmatrix} \sigma^2 & \rho\sigma\tau \\ \rho\sigma\tau & \tau^2 \end{bmatrix} \right)$, then with probability $1$, the sample means, the sample variances, and the sample correlation do not match the corresponding population values exactly. The idea is to draw a random sample in which they do match.

Part of the problem has a solution that probably every mathematician knows by reflex: subtract the sample mean of the $X$ values from each $X$ value, then divide each $X$ value by the sample standard deviation of the $X$ values, and then multiply by $\sigma$ and finally add $\mu$. Do a similar thing with $Y$.

But how does one deal similarly with the correlation $\rho$?

I will post my own answer to this. It's not the only way to do it, so add your own if so inspired.

1

There are 1 best solutions below

0
On

Here's one way.

First draw two random samples from the $N_1(0,1)$ distribution, getting $\begin{bmatrix} X_i \\ Y_i\end{bmatrix}$, $i=1,\ldots,n$.

Then for each $i$ replace $Y_i$ with the $i$th residual from regression of the (original) $Y$ values on the $X$ values. The effect of this is that $(1)$ the mean of the chosen $Y$ values will now be exactly $0$ and $(2)$ the correlation between the $X$s and the $Y$s will now be exactly $0$.

Then subtract the average of the $X$s from each $X$; then divide each $X$ by the standard deviation of the $X$s and each $Y$ by the standard deviation of the $Y$s.

Now we have both sample means exactly $0$, both sample standard deviations exactly $1$, and the sample correlation exactly $0$.

Now let $$ M = \frac12 \begin{bmatrix} \sqrt{1+\rho}+\sqrt{1-\rho} & & \sqrt{1+\rho}-\sqrt{1-\rho} \\ \sqrt{1+\rho}-\sqrt{1-\rho} & & \sqrt{1+\rho}+\sqrt{1-\rho} \end{bmatrix}. $$ Then $M$ is a positive-definite symmetrix square root of the desired correlation matrix $\begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}$.

Now replace the $n\times 2$ matrix of $X$s and $Y$s with this matrix: $$ \begin{bmatrix} X_1 & Y_1 \\ \vdots & \vdots \\ X_n & Y_n \end{bmatrix} M. $$

Now the sample correlation is exactly $\rho$, and the two means and two standard deviations are still what they were.

Finally, multiply the $X$s by $\sigma$ and then add $\mu$ and do similarly with the $Y$s; this does not affect the correlation.

The use I have made of this is simply to construct scatterplots with specified values of descriptive statistics for pedagogical purposes.