Generating correlated random variables with discrete distribution

2.2k Views Asked by At

I would like to find a simple way to generate two correlated random variables under the condition that each r.v has a same discrete distribution (for example Bernoulli distribution) This link provides a solution but I wonder if there is something more straightforward.

Thanks !

2

There are 2 best solutions below

2
On

Simplest example I can think of:

Let $X_1, X_2, X_3$ be iid Bernoulli with success probability $\theta = 1/2.$ Then $Y_1 = X_1 + X_2$ and $Y_2 = X_2 + X_3$ are both $Binom(2, 1/2),$ but they are correlated.

Because you mention 'generating' here is a simple simulation in R:

 x1 = rbinom(10^6, 1, 1/2)
 x2 = rbinom(10^6, 1, 1/2)
 x3 = rbinom(10^6, 1, 1/2)
 y1 = x1 + x2;  y2 = x2 + x3
 cor(y1,y2)
 ## 0.5007598

Many possible variations with $W_1 = X_1 + aX_2$ and $W_2 = aX_2 + X_3$, for various values of $a.$

3
On

When you are dealing with two discrete random variables you can calculate the conditional probabilities and simulate directly from there. The following doesn't require that the two variables $X$, and $Y$ have the same distribution, just that they both have $n$ outcomes (for simplicity of exposition).

Whatever your correlation is, use that to determine the joint density of the two random variables. This is an array of numbers $$P(X = x, Y = y) = p_{xy}$$

Now calculate the marginal density of $X$ and the conditional density of $Y$: $$P(X = x) = \sum_y p_{xy}$$ $$P(Y = y | X = x) = p_{xy} / P(X = x)$$

To simulate, first choose a value for $X$ using the distribution $X = x$. Then to find $Y$, choose from the distribution $P(Y = y | X = x)$ that conditions on the outcome you saw for $X$.

If your discrete distribution is Bernoulli then your correlation will directly define the joint distribution as follows:

Suppose $P(X = 1) = p$ and $P(X = 0) = 1-p$. Then $$\rho = \frac{E[XY] - E[X]E[Y]}{\sqrt{E[X^2] - E[X]^2}\sqrt{E[Y^2] - E[Y]^2}} = \frac{P(X = 1, Y = 1) - p^2}{p - p^2}$$ Now can solve for $P(X = 1, Y = 1)$ to find: $$P(X = 1, Y = 1) = p^2 + \rho p(1 - p)$$

For the general distribution you can use the above definition of correlation to fix the cross moment $E[XY]$.
$$E[XY] = E[X]^2 + \rho\left(E[X^2] - E[X]^2\right)$$ Beyond that, if there are more than two outcomes, you will have many degrees of freedom in the joint density, any choice of which will give you the correlated outcomes you are looking for.