What is the covariance of these Bernoulli variables?

515 Views Asked by At

Consider the two Bernoulli variables X and Y, where \begin{align} P(X) = \begin{cases} p_{1} & X = 1\\ 1-p_{1} & X = 0\\ \end{cases} &\qquad\;\;& P(Y) = \begin{cases} p_{2} & Y = 1\\ 1-p_{2} & Y = 0\\ \end{cases} \end{align}

with a covariance of $cov(X, Y) = k$

Now, there are two sets of data, $S_{1}$ and $S_{2}$, where

\begin{equation} S_{1} = \{x_{1}, x_{2}, \cdots,x_{n}\} \end{equation}

\begin{equation} S_{2} = \{y_{1}, y_{2}, \cdots,y_{n}\} \end{equation}

Where $S_{1}$ is sampled from $B_{1} \sim B(N, p_{1})$ and $S_{2}$ is sampled from $B_{2} \sim B(N, p_{2})$, where $B_{1}$ and $B_{2}$ are generated from N trials of X and Y. Furthermore, the sampling is done such that $(x_{i}, y_{i})$ are generated together.

Does the value of $N$ make any difference? That is, when the sample size is large enough, will the covariance between $S_{1}$ and $S_{2}$ approach the covariance between X and Y?

2

There are 2 best solutions below

0
On BEST ANSWER

Partial answer:

A binomial $B(N,p)$ distribution being the sum of $N$ Bernoulli $\beta_k$ (assumed independent):

$$cov(B_1,B_2)=cov(\beta_1+\beta_2+...+\beta_N,\beta'_1+\beta'_2+...+\beta'_N)$$

By bilinearity of "cov" operator:

$$cov(B_1,B_2)=\sum_{i,j=1...N} cov(\beta_i,\beta'_j)=N^2 k$$

Therefore: yes, the theoretical covariance is dependent upon "size" $N$.

Related: See [this question] (https://stats.stackexchange.com/questions/417360/covariance-between-two-binomial-random-variables/417367)

0
On

When N gets larger, the sets S1 and S2 converge in distribution to B1 and B2. That would automatically imply that the empirical covariance converges to the theoretical covariance.