Variance of sample mean difference

131 Views Asked by At

I have encountered this a variation of this problem as a test task for a job interview. It's been long since I last encountered probability theory, so I couldn't solve it. Still, it deprived me of inner peace, which is why I seek your help.

Consider a standard normal distribution D with mean $0$ and variance $1$. There were several subproblems to be solved.

1) Take two samples from D, 20 elements each: $\{X_1, X_2, ..., X_{20}\}$ and $\{Y_1,Y_2,...,Y_{20}\}$. Compute sample means $\bar{X}$ and $\bar{Y}$. Find the variance of $W = \bar{X} - \bar{Y}$.

Well, this is something I seem to have managed to do. So, I assumed all $X$'s and $Y$'s to be independent and identically distributed with $N(0;1)$. Which is why $\bar{X}$ and $\bar{Y}$ should also be independent, and

$$Var(W) = Var(\bar{X} - \bar{Y}) = Var(\bar{X})+Var(\bar{Y})=\frac{\sigma^2}{n}+\frac{\sigma^2}{n}=\frac{1}{20}+\frac{1}{20}=\frac{1}{10}.$$

However, what's next is a complete puzzle to me:

2) Take two samples from D, 40 elements each. From each of these two samples respectively, randomly pick 20 elements without repetition, creating subsamples $X^*_1 ... X^*_{20}$ and $Y^*_1 ... Y^*_{20}$. In these subsamples, compute means $\bar{X^*}$ and $\bar{Y^*}$. The task is to find $Var(\bar{X^*}-\bar{Y^*})$.

3) Take two samples from D, 40 elements each. From each of these two samples respectively, randomly pick 20 elements, repetition allowed, creating subsamples $X^*_1 ... X^*_{20}$ and $Y^*_1 ... Y^*_{20}$. In these subsamples, compute means $\bar{X^*}$ and $\bar{Y^*}$. The task is to find $Var(\bar{X^*}-\bar{Y^*})$.

These "layers" of sampling drove me crazy. Please give me a meaningful piece of explanation!

1

There are 1 best solutions below

0
On

For $(2)$ the process of collecting $40$ observations and then throwing $20$ of them away is really the same experiment as just taking $20$ in the first place, exactly as in $(1)$. So the answer is $1/10$.

Repetition isn't quite so easy. Here is some R code, it looks like it is converging to about $.147735$

list <- 1:1000000

for (i in 1:1000000){ $\\$ sample1<- rnorm(40, 0, 1) $\\$ sample2 <- rnorm(40,0,1) $\\$

subsample1 <- sample1[sample(1:40, 20, replace = TRUE)] $\\$

subsample2 <- sample2[sample(1:40, 20, replace = TRUE)] $\\$

mean1 <- mean(subsample1) $\\$

mean2 <- mean(subsample2)

var <- mean2 - mean1 list[i] = var }

var(list)