Ratio estimation to find the total with 95% confidence interval

96 Views Asked by At

A survey has been conducted to see how many people in a town of $40,000$ people used Ebay to purchase a product last year. A simple random sample of $230$ is taken and from this sample $52$ people had used Ebay last year.

I'm asked to estimate the total number of people who used Ebay last year and compute the $95$% confidence interval for the total (we are allowed to assum the $97.5$% quantile of the normal distribution is $1.96$).

The first part I believe is straightforward, I just find my ratio $R$ and multiply by my population total $t$ to get $52/230 \cdot 40,000 = 9043.4783 \ (\hat{t})$ but finding the confidence interval is a bit confusing because I'm hardly given any information here. I have formulas I would usually use to find the variance of $R$ and then the variance of $\hat{t}$ and then find the confidence interval but all these formulas either require $\bar{y}$ (and sometimes $\bar{x}$) or they require $\sum{y_i^2},\sum{y_ix_i},\sum{x_i^2}$ and I have none of these. I only have the total population, the sample size and the ratio. Any ideas how I would obtain my $95$% confidence interval for the total?

Also, there is a second part that asks what sample size would be required for the total number to be within $940$ units of the true value (I'm guessing they mean the total number of people in the town who used ebay last year), with confidence $95$%? On this part of the question I'm just not sure what to do.

These are both low mark questions so I'm probably just forgetting a formula or missing something but I just can't see a way to get my answers with so little information given. I've double and triple checked and this is for sure the only information that is given about this survey.

Any help would be greatly appreciated, thanks in advance

2

There are 2 best solutions below

0
On BEST ANSWER

EDIT: I changed "without" to "with." I mean "with" replacement so each "trial" is identical to any other "trial."

Since $230$ is much smaller than $40,000,$ you can think of the sample as a sample with replacement. In this case, you have a probability of success $p$ (the "true ratio of people who use Ebay") and 230 is your sample size. Thus, you have $X \sim \mathsf{Bin}(230, p)$ and the observed value of $X$ is $X = 53.$ You can then use a normal approximation to the binomial distribution using the $\mathsf{ML}$ estimate $\hat p := \dfrac{X}{n}$ with observed value $\hat p = \dfrac{53}{230} \approx 0.23.$ We know $X \approx \mathsf{Norm}(p, np(1-p))$ so $\hat p \approx \mathsf{Norm} \left( p, \dfrac{p(1-p)}{n} \right).$ The 95% approximate $\mathsf{CI}$ for $\hat p$ is therefore $$ 0.23 \pm 1.97 \sqrt{\dfrac{(0.23)(0.77)}{230}} \approx 0.23 \pm 2 \times 0.0277 \approx 0.23 \pm 0.0555 = (0.1745, 0.2855). $$

0
On

This is a classic example of sampling distributions. There is in fact a formula for the standard deviation of a sample mean with just n and p: $$SD = \sqrt(\frac{p(1-p)}{n}$$

Here n = 230 and p = 52/230, and when we plug it in we get $SD = 0.027581$. That is, the sample mean will typically vary by about 0.027581 from the mean of 0.226087. This is for proportion of people that used ebay to purchase a product last year. Scaling this from a probability up to the full 40,000 person population we get a mean of 9,043.4783 and standard deviation of 1,103.2652. Assuming that 95% of the distribution is within 1.96 SD of the mean, we get a range of 7,940.213 to 10,146.743. That is, we can say with 95% confidence, that the true number of people in the population that used ebay to purchase a product last year, was between 7,940.213 to 10,146.743 people.

Edit: As for the second part of the question, it can be found working backwards. In order to be within 940 units of the true value with 95% confidence, then the $SD = 479.5918$ ($\frac{940}{1.96}$). Scaling this down to the SD of the sample probability (divide by 40,000) we get: $SD = 0.01199$. Now we put this into the the formula we used at the start, and solve for $n$:

$$0.01199 = \sqrt(\frac{52/230(1-52/230)}{n}$$ $$1.4376*10^{-4} = \frac{0.174972}{n}$$ $$n = \frac{0.174972}{1.4376*10^{-4}} = 1217.15$$

Notes:

  • You can use a normal distribution in this case since $0.1n < N$ and $p*n > 10 $ and $(1-p)*n > 10$
  • You can find the SD formula used on this textbook website.