Confidence interval for population proportion

683 Views Asked by At

You have a population of 10M people. Suppose you want to determine the share of people with a certain disease. To do so you randomly choose X people and check if they have the disease. You determine that a share P of those people have the disease.

How do I compute the 75% / 90% / 95% confidence intervals for the disease prevalence for the whole population?

e.g. find $y$ so that there is a 95% chance the disease rate in the whole population is in the range $[X-y , X+y].$

Thank you

1

There are 1 best solutions below

1
On BEST ANSWER

Suppose $n$ people are chosen at random from a population with $N$ members, where $n/N < 0.1.$ If $X$ of the $n$ have a certain disease, then $\hat p = X/n$ is an estimate of the proportion $p$ of people in the population who have the disease.

Wald. Then a traditional Wald 95% confidence interval (CI) for $p$ is $$\hat p \pm 1.96\sqrt{\hat p(1-\hat p)/n}.$$ The 'probability factor' is changed from 1.960 (to 1.645 or 1.150) to get confidence levels 90% or 75%, respectively.

alpha = c(.05, .10, .25)
qnorm(1 - alpha/2)
## 1.959964 1.644854 1.150349

Agresti. When the confidence level is 95% and $n$ is small (some say less than 100), then it is better to use the Agresti (CI): Let $\tilde n = n+4$ and $\tilde p = (X+2)/\tilde n$ to get the CI $$\tilde p \pm 1.96\sqrt{\tilde p(1-\tilde p)/\tilde n}.$$ This form of the Agrest CI is intended for use only at the 95% confidence level.

Wilson. In general, a slightly more accurate form of CI than either of these is the 'Wilson interval for a binomial proportion', which has a somewhat more complicated formula, I refer you to the Wikipedia article for that.

Note: For more on binomial CI's see another item on this site.