How to compute confidence interval

94 Views Asked by At

The problem I want to solve

Here is what I got number of participants $n=Y_1+Y_2+...+Y_N$. Then $E(n) = E(Y_1) + ... E(Y_N)$. Based on $S_0, S_1, S_2$, we know $P(Y_i=0) = 125/200, P(Y_i=1) = 30/200, P(Y_i=2)=45/200$. Then $E(n) = N*E(Y_i) = N * (0*P(Y_i=0) + 1*P(Y_i=1)+2*P(Y_i=2))=120$. But I do not know how to compute the confidence interval?

1

There are 1 best solutions below

0
On BEST ANSWER

Note: Technically I don't think "CI" is the correct term for your textbook to use here -- they're asking for a prediction interval for the number of candidates from the 200 letters you'll be sending out -- but I'll stick with CI.

They're asking for an approximate $90\%$ CI for a fairly "large" sum of random variables. When I see "approximate interval" in a text, that is a hint (usually) that you need to use the normal approximation (at least in introductory classes). This means we need to calculate the mean $E[T]$ and variance $V[T]$ of $T =\sum_{1}^{200} Y_i$ and form the $90\% CI$ using the usual two-sided z-score $(z_{.05} = 1.64)$:

$$CI_{90}(T) \approx E[T] \pm 1.64\sqrt{V[T]}$$

We are told that the $Y_i$ are independent but not identically distributed, so we need to be careful how we interpret the $S_i$ and $S_{ij}$ -- these are sums over probabilities (i.e., numbers) not sample sums (i.e., random variables). However, we can relate them to $E[T]$ and $V[T]$:

$$E[T] = E\left[\sum_{i=1}^{200} Y_i\right] = \sum_{i=1}^{200} E\left[Y_i\right]=\sum_{i=1}^{200}[p_i(1)+2p_i(2)] = \left(\sum_{i=1}^{200}p_i(1)\right)+2\left(\sum_{i=1}^{200}p_i(2)\right) = S_1+2S_2 = 120\;\textrm{(as you calculated)}$$

While you also got the same answer, your conclusion was too general: $P(Y_i=j) \neq \frac{S_i}{200}$ since the researcher's assigned probabilities differ among the potential candidates.

Next, we need to get $V[T]$:

$$V[T] = V\left[\sum_{i=1}^{200} Y_i\right] = \sum_{i=1}^{200} V\left[Y_i\right] \;\textrm{by independence of the }Y_i$$

Each $Y_i$ will have it's own variance:

$$V[Y_i] = E[Y_i^2]-(E[Y])^2 = [p_i(1)+4p_i(2)] - [p_i(1)+2p_i(2)]^2 =$$ $$-p_i(1)^2 - 4p_i(1)p_i(2)+p_i(1)-4p_i(2)^2+4p_i(2)$$

Therefore

$$\sum_{i=1}^{200} V\left[Y_i\right]= \sum_{i=1}^{200}[-p_i(1)^2 - 4p_i(1)p_i(2)+p_i(1)-4p_i(2)^2+4p_i(2)] = $$

$$-\sum_{i=1}^{200}p_i(1)^2 - 4\sum_{i=1}^{200}p_i(1)p_i(2)+\sum_{i=1}^{200}p_i(1)-4\sum_{i=1}^{200}p_i(2)^2+4\sum_{i=1}^{200}p_i(2) = $$

$$-S_{11}-4S_{12}+S_1-4S_{22}+4S_2 = $$ $$-10-4(10)+30-4(15)+4(45) = 100 =V[T] \implies \sqrt{V[T]} = 10$$

We now have all the ingredients we need to get our approximate $90\%CI$ for $T$:

$$CI_{90}(T) \approx E[T] \pm 1.64\sqrt{V[T]} = 120 \pm 1.64*10 = [103.6,136.4]$$

Since we are trying to predict the number of candidates, we should conservatively round this to integral values to get integers:

$$\textrm{We expect approx. } [\lfloor 103.6 \rfloor, \lceil 136.4 \rceil] = [103,137] \;\textrm{candidates from our 200 letters}$$