What is the Covariance of Two Bernoulli Random Variables coming from the same sample, under design-based approach?

1.1k Views Asked by Bumbble Comm At 28 Mar 2026 - 1:06

The scenario for the question is as follows: We draw a sample of size n from a population of size N. Assume a design-based approach (meaning the variable of interest $y_i$ (where $i$ represent the $i^{th}$ unit in the population) has fixed value and is unknown. The random variable $Z_1, Z_2, Z_3, ....Z_i..., Z_N$ represent if the $i_{th}$ unit in the population is in sample or not.

In other words: $Z_i = 1$ if unit i is in the sample, and equals $0$ otherwise.

We choose an Simple Random Sample (SRS) of size n out of the N population units, and the $Z_i$'s are identically distributions Bernoulli random variable with

$p_i=P(Z_i=1)=P(select\space unit\space i \space\ in\space the \space sample)=n/N$

and

$P(Z_i=0)=P(select\space unit\space i \space\ in\space the \space sample)=(1-(n/N))$

The question involves finding the value for $E[Z_iZ_j]$ and $Cov(Z_i, Z_j)$

My questions are the following:

1.) What is the meaning of the random variable $Z_iZ_j$? From what I understand, it is a function tha maps from the set of all possible outcomes to the real numbers. There are 4 possible outcomes concerning the combination of the two random variables $Z_i$ and $Z_j$: 1.) both are in sample, 2.) both are not in sample, 3.) i is in sample, j is not in sample, 4.) i is not in sample, j is in sample. The caluclation of $E[ZiZj]$ given in the textbook (Sampling: Design and Analysis, by Sharon Lohr, second edition, page 52, if you happen to have the book) considers only the conditional probability of both units in sample when calculating the expectation. I can find the probability for each of the 4 scenarios listed above, but i do not understand what value should they take on. The calculation in the textbook is as follows:

$E[Z_iZ_j]=P(Z_i=1 \space and \space Z_j=1) = P(Z_i=1 \space | \space Z_j=1)*P(Z_i =1) = (\frac{n-1}{N-1})*(\frac{n}{N})$

Why is the other scenarios not considered? And why is the value for this scenario set to 1?

Second Question: the calculation for Covariance is given as follows: $Cov(Z_i, Z_j)=E[Z_i Z_j]-E[Z_i]E[Z_j]=(\frac{n-1}{N-1})*(\frac{n}{N}) - (\frac{n}{N})^2=-(\frac{1}{N-1})*(1-\frac{n}{N})*\frac{n}{N}$

How did the derivation from the second last line to the last line happen?

Thank you.

Original Q&A

What is the Covariance of Two Bernoulli Random Variables coming from the same sample, under design-based approach?

Related Questions in STATISTICS

Related Questions in RANDOM-VARIABLES

Related Questions in EXPECTED-VALUE

Related Questions in COVARIANCE

Related Questions in SAMPLING

Trending Questions

Popular # Hahtags

Popular Questions