My ultimate goal is to be able to have a way to generate a vector of size $N$ of correlated Bernoulli random variables. One way I am doing this is to use the Gaussian Coupla approach. However, the Gaussian Coupla approach just leaves me with a vector:
$$ (p_1, \ldots, p_N) \in [0,1]^N $$
Suppose that I have generated $(p_1, \ldots, p_N)$ such that the common correlation between them is $\rho$. Now, how can I transform these into a new vector of $0$ or $1$'s? In other words, I would like:
$$ (X_1, \ldots, X_N) \in \{0,1\}^N $$
but with the same correlation $\rho$.
One approach I thought of was to assign a hard cutoff rule such that if $p_i < 0.5$, then let $X_i = 0$ and if $p_i \geq 0.5$, then let $X_i = 1$.
This seems to work well in simulations in that it retains the correlation structure but it is very arbitrary to me what cutoff value should be chosen aside from $0.5$.
Another way is to treat each $X_i$ as a Bernoulli random variable with success probability $p_i$ and sample from it. However this approach seems to cause loss of correlation and instead of $\rho$, I may get $\frac{\rho}{2}$ or $\frac{\rho}{3}$.
Does anyone have any thoughts or inputs into this? Thank you.
There are a lot of simple ways to make correlated Bernoulli variables with the same marginal distribution and the same correlation parameter. For example, you could define $\{Y_0, Y_1, ..., Y_n\}$ as i.i.d. Bernoulli with $P[Y_i=1]=m$. Define an independent variable $H \in\{0,1\}$ with $P[H=1]=\theta$. Then define $$ X_i = \left\{ \begin{array}{ll} Y_0 &\mbox{ if $H=0$} \\ Y_i & \mbox{ if $H=1$} \end{array} \right.$$ so $\{X_i\}_{i=1}^n$ are correlated Bernoulli with the same mean $E[X_i]=m$ and the same correlation parameter $E[X_iX_j] = m(1-\theta) + m^2\theta$ for $i \neq j$. You can slide $\theta \in [0,1]$ to slide $E[X_iX_j] \in [m^2, m]$.
Notice that $X_i X_j \leq X_i \implies E[X_iX_j] \leq E[X_i] = m$. So the largest possible $E[X_iX_j]$ value for any method is $m$, which can be achieved by this particular method. However, the lowest $E[X_iX_j]$ that this particular method can provide is $m^2$.
It is possible to get $E[X_iX_j]<m^2$ with alternative methods. For example, if $m \leq 1/n$, then we can design to have $E[X_iX_j] = 0$ simply by defining $U$ as a random variable uniform over $[0,1]$ and defining $$ X_i = 1_{\{U \in \mathcal{A}_i\}} $$ where $\mathcal{A}_i$ are disjoint subsets of $[0,1]$, each with measure $m$.
An interesting and non-obvious question is, for a given $m \in [0,1]$, what is the set of all $E[X_iX_j]$ values that can be achieved, assuming $E[X_i]=m$ for all $i$, and $E[X_iX_j]$ is the same for all $i \neq j$. (The above shows that we can achieve $E[X_iX_j] \in [m^2, m]$, and it is impossible to make it larger than $m$, so it remains only to consider the interval $[0,m^2)$.)
It is also easy to see that if we can achieve $E[X_iX_j]=\rho$ for all $i\neq j$ and for $\rho < m^2$, we can achieve any point in the interval $[\rho, m]$. So the achievable set is always an interval, and it suffices to consider, for each $m \in [0,1]$, the smallest $E[X_iX_j]$ value that can be achieved.