Can we rely on Confidence Intervals?

330 Views Asked by At

Suppose the mean is in (7.6,8.4) with 95% confidence. I understand that this means 95% of the confidence intervals from different samples will contain the population mean. But, what is the significance of this particular interval on its own. Since I am sure that 95% of sampling intervals will contain the mean can I be somewhat certain that this interval is one of them? If not, how is this interval useful at all to me?

In other words, how sure can I be that mean is in (7.6,8.4) and if I can't be sure then what's the use of this?

4

There are 4 best solutions below

5
On

Consider it a postulate of statistics that sufficiently unlikely events do not happen. Obviously this is not the case in reality but it is a good enough approximation to reality to be useful for practical purposes.

If I flip a coin 10 times, I am essentially certain that heads will not come up 10 times in a row. In fact, I will do this experiment right now. If I get 10 heads in a row, I promise to delete my SE account and throw my laptop in a lake.

Here are my results: TTHHTHHTTH

Whew!

0
On

Only one specific $95 \%$ confidence interval $(7.6, 8.4)$ is usually not sufficient to derive the wanted information about the statistical parameter of interest. Nevertheless it provides more information than simply stating that the mean is approximately $8.0$. One aspect is the resulting precision of the CI.

Precision of CI: A confidence interval respects random statistical fluctuations due to sampling variations. The resulted precision of the interval $(7.6, 8.4)$ is given by its length $0.8$ and we can think about if this is appropriate for our needs or not.

It might indicate that we should try to get a larger sample size to obtain a smaller CI. It could also indicate to weaken the confidence level in order to reduce the size of the interval.

So, this specific CI can help to fine-tune the process and improve our model of the population under analysis.

Notes:

  • In real life just one CI is by far not enough to derive any conclusions. A population in real life is influenced by many different aspects and a statistical model is typically a rough and simplified version of the real life situation.

    Crucial for the validity of the confidence interval is also the selection mechanism of the sample(s).

    In order to overcome these difficulties we need to learn from real life by repeating (if possible) the tests, getting new samples and calculate more and more CIs this way and analyse the situation each time. We can use them to improve our knowledge of the unknown parameter of interest and to increase our confidence in the so-derived region of the true mean.

  • I'd like to mention in this context Statistical Intervals - A Guide for Practitioners by G.J. Hahn and W.Q. Meeker, which provides helpful information and examples to calculate confidence intervals, prediction intervals and tolerance intervals following different distributions.

    Regarding precision of CIs the authors state:

    ... We wish to reiterate that the issue of data quantity is often secondary to that of the quality of the data. In particular, in making a statistical estimate or constructing a statistical interval, one assumes that the available data were obtained by using a random sample from a defined population of process of interest. As stated previously, when this is not the case, all bets are off. Just increasing the sample size - without broadening the scope of the investigation - does not compensate for lack of randomness; all it does is allow one to obtain a (possibly) biased estimate with greater precision. Putting it another way, increasing the sample size per se usually improves the precision of an estimate, but not necessarily its accuracy.

3
On

In other words, how sure can I be that mean is in (7.6,8.4) and if I can't be sure then what's the use of this?

You can be $95\%$ sure that the mean is in your confidence interval, in a probabilistic sense: as you said, if you repeated the sampling $100$ times, $95$ times you would be facing a confidence interval that contains the real mean. While looking at a single sampling, someone will say that the mean is or is not in the interval, so you cannot talk about probability. But if you had to place a bet about the mean being in your interval, then... you'd just have $95\%$ chances of winning.

0
On

I'd like to question your premise. What does "somewhat certain" mean to you? $95\%$ confidence that your interval contains the mean? What about $99\%?$ $99.99999\%?$

To answer your question, we can find the interval containing the mean to any arbitrary degree of confidence. However, there's a cost you need to pay for an increased probability. Either you can:

$\textbf{1.}$ Increase the sample size $n$, or

$\textbf{2.}$ Increase the radius of your confidence interval.

In particular, we achieve precisely $100\%$ confidence when either:

$\textbf{1)}$ $n$ equals the population size, or

$\textbf{2)}$ Your confidence interval is the interval $(-\infty, \infty).$

You can probably see though why these scenarios are not ideal in practice. It defeats the purpose of using statistics. The beauty of statistics is that it can tell us useful information about what we don't know, rather than what we already do.

If you'd like to use statistics to determine the mean with perfect certainty, then you are in fact using the wrong tool, because statistics is the study of uncertainty.