Variance of difference in X and Y problem

69 Views Asked by At

I toss 3 coins 400 times. If I get 3 heads I gain a point, if I get 3 tails I lose a point, if I get neither then I gain/lose no points. I am struggling to find the variance of my total score S.

I know that $S = X - Y$ where $X$ denotes the number of 'triple heads' I get, and $Y$ denotes the number of 'triple tails' I get. Both variables follow a binomial distribution with parameters $n=400$ and $p=1/8$ so it is easy enough to find the expectation $E(S) = E(X-Y) = E(X) - E(Y) = 0$.

But how am I meant to find the variance?

$Var(S) = Var(X-Y) = E(X^2) + E(Y^2) - 2E(XY),$ and I can find both $E(X^2) , E(Y^2)$ but I am not sure how to find $E(XY)$ because they are not independent.

1

There are 1 best solutions below

0
On

Let $X_i$ represent the random number of points obtained on the $i^{\rm th}$ trial, $i \in \{1, 2, \ldots, 400\}$. Then assuming the coins are fair, $$\Pr[X_i = 1] = 1/8, \quad \Pr[X_i = 0] = 3/4, \quad \Pr[X_i = -1] = 1/8.$$ Since you have already established that the expected value of a single trial is $\operatorname{E}[X_i] = 0$ (which is obvious by symmetry), the variance of a single trial is $$\operatorname{Var}[X_i] = \operatorname{E}[X_i^2] - \operatorname{E}[X_i]^2 = \operatorname{E}[X_i^2]$$ as the second term is $0$. Then $$\operatorname{E}[X_i^2] = 1^2 \Pr[X_i = 1] + 0^2 \Pr[X_i = 0] + (-1)^2 \Pr[X_i = -1] = 1/8 + 1/8 = 1/4.$$

Since each of the $X_i$s are independent ("ind") and identically distributed ("id"), the variance of the sum is equal to the sum of the variances; i.e., $$\operatorname{Var}[X_1 + \cdots + X_{400}] \overset{\text{ind}}{=} \operatorname{Var}[X_1] + \cdots + \operatorname{Var}[X_{400}] \overset{\text{id}}{=} 400\operatorname{Var}[X_i] = 400(1/4) = 100.$$


It is worth mentioning that the solution is straightforward when the proper choices of notation and probability model are made. In your model, you tried to use separate binomial distributions to represent the positive points and the negative points obtained. But the difficulty with this approach is that these are not independent random variables. Whereas in the approach above, it is fruitful to consider what outcomes are truly independent and to assign the appropriate model and random variables to those instead.