How to calculate confidence interval (normal distribution)?

882 Views Asked by At

There seems to be two formulas for calculating confidence interval of a sample with a normal distribution (in both cases, $P$ refers to the sample proportion):


Method A:

$$P \pm \frac{1}{\sqrt n}$$ where $n$ is the population sample size

Method B:

$$P \pm z \cdot \sqrt{\frac{P (1-P)} n}$$

where $n$ is the population sample size and $z$ is the $z$-score radius for the intended confidence level (for example $1.96$ for $95\%$)


Which method should I use or when should I use each method?

What is the difference between the confidence level of the two formulae?

2

There are 2 best solutions below

2
On

The first (method A) is OK if you're willing to assume $P \approx 1/2$ and the confidence level is 95% so that $z = 1.96\approx 2.$

If you plug $p = 1/2$ into the formula for method B, then the margin of error is $E = 1.96\sqrt{{.5(1-.5)}{n}} = 1.96\sqrt\frac{1}{4n} \approx 1/\sqrt{n}.$

If the true value of $p$ is unknown, then sometimes people use $p = 1/2$ in estimating $E$ because $p=1/2$ gives the largest possible $E.$ This is used often in public opinion polls. If the race between A and B is close then you need to interview about $n = 2500$ subjects for the margin or error to be $E = 0.02 = 2\%.$

[The parabola $y = x(1-x),$ for $0 < x < 1)$ has its maximum at $x = 1/2,\ y = 1/4.]$

1
On

The number $n$ would in no case be the population size; rather it is the sample size.

You say "in both cases, $P$ refers to the sample proportion."

If you have a sample from a normal distribution, what in the world is "the sample proportion"?

The usual technique yields the interval $\overline x \pm z \cdot \dfrac s {\sqrt n}$ where $\overline x$ is the sample mean and $z$ is as you described it, and $s$ is the sample standard deviation, taken to be the square root of the usual "unbiased" estimator of the sample variance: $s^2= \sum_{i=1}^n (x_i-\overline x)^2/(n-1).$ The confidence interval is for the population mean, not to be confused with the sample mean.

The formula $\widehat p\pm z\sqrt{\frac{\widehat p(1-\widehat p)} n}$ is used for sampling from a Bernoulli-distributed population, not from a normally distributed population. In a Bernoulli-distributed population, each observation is either $0$ or $1,$ and that is not the case with a normally distributed population. Here I've used the notation $\widehat p$ to refer to the sample proportion, not the population proportion, where in standard usage, $p$, as opposed to $\widehat p,$ is the population proportion. The confidence interval is for $p.$

An exercise in algebra will tell you that when every observation is either $0$ or $1,$ then $\sqrt{\widehat p(1-\widehat p)}$ coincides with $s$ as defined above.