Conclusion for confidence interval

3.4k Views Asked by At

If I got, let's say, a 95 % confidence interval for the mean and a 95 % confidence interval for the variance.

Would it then be wrong to conclude:

The 95 % confidence interval for the mean contains with at least 95 % probability the true mean?

and

The 95 % confidence interval for the variance contains with at least 95 % probability the true variance?

What would be a more correct/precise way to express what the confidence intervals stand for? I feel like there's a lot of different conclusions when searching for it.

1

There are 1 best solutions below

0
On BEST ANSWER

The 'meaning' of interval estimates is a controversial topic on applied statistics. So there is no universally accepted answer to your important question.

Let's just use a proposed sample of size $n = 31$ from a normal population with unknown population mean $\mu$ and unknown variance $\sigma^2.$

Then $T = \frac{\bar X - \mu}{S/\sqrt{n}} \sim \mathsf{T}(n-1),$ so that $P(-2.042 \le T = \frac{\bar X - \mu}{S/\sqrt{n}} \le 2.042 )=0.95.$ Here $\bar X$ and $S$ are the sample mean and variance, respectively.

Manipulating inequalities in the event, we get $P(\bar X - 2.042\frac{S}{\sqrt{n}} \le \mu \le \bar X + 2.042\frac{S}{\sqrt{n}}) = 0.95.$ This is purely a probability statement. Specifically, it is a probability statement about the behavior of the random variable $\bar X$ and $S$: the random interval $(\bar X - 2.042\frac{S}{\sqrt{n}}, \bar X + 2.042\frac{S}{\sqrt{n}})$ has a 95% probability of covering (including) the unknown constant $\mu.$

Now suppose we take the sample and obtain $\bar X = 21.3$ and $S^2 = 1.44.$ Then the random interval becomes $\left(21.3 - 2.042(0.1249), (21.3 + 2.042(0.1249)\right)$ or $(20.860, 21.740).$

But now we are dealing with observed quantities. According to the usual frequentist interpretation of probability, this is no longer a probability statement: Either the interval $(20.860, 21.740)$ includes $\mu$ or it does not. Accordingly, the interval $(20.860, 21.740)$ is called a 95% confidence interval.

The confidence interval is a statement about the data. Over the long run, we will obtain data so that the manipulation in the emphasized paragraph will produce an interval that includes the true population $\mu$ in 95% of such experiments.

The reason for calling the interval estimate a 'confidence' interval instead of a 'probability' interval has to do with a strict interpretation by frequentist statisticians of the word 'probability'.

Bayesian statisticians treat $\mu$ as a random variable, begin with a 'prior' distribution on $\mu$, combine the data with the prior distribution to get a 'posterior' distribution, and use the posterior distribution to get a probability interval for $\mu$ (some say a credible interval). If the prior distribution is "flat" (containing little information), then the Bayesian and frequentist interval estimates will be numerically very similar. But philosophies as to the "meaning" of the interval estimate differ.

Both frequentists and Bayesians have their critics. Strictly speaking, frequentists are are not saying anything about the experiment at hand--only about what 'works' over the long run. A Bayesian is addressing the experiment at hand, but needs to explain how the prior distribution was obtained and what effect it has on the interval estimate.