Possible misunderstandings of the confidence interval definition keep on bugging me from time to time, and I want to get this sorted.
More specifically, I refer to a misunderstanding I believe well explained on the mobile-Wiki page, section "Misunderstandings". (also in the regular version)
For clarifying the correct meaning of the concept, the following paragraph ensues:
A $95\%$ confidence level does not mean that for a given realized interval there is a $95\%$ probability that the population parameter lies within the interval (i.e., a 95% probability that the interval covers the population parameter).[10] According to the strict frequentist interpretation, once an interval is calculated, this interval either covers the parameter value or it does not.
So, if I repeated the experiment and calculated appropriate confidence intervals $n$ times, I would capture the parameter of interest 95% in the limit. But picking one realization, I cannot make a probability statement.
It seems analogous to me of saying:
We have a machine that, when producing $n$ balls, tends to produce 95% black balls as $n$ grows. Yet, if somebody asks you, as the machine is about to produce a ball, or after a ball is produced and kept hidden from you, what colour will it be, you cannot make a probability statement.
I find this very puzzling.
Alternatively, I find this additional source of confusion:on the same page I read the following statement:
The confidence interval can be expressed in terms of a single sample: "There is a $90\%$ probability that the calculated confidence interval from some future experiment encompasses the true value of the population parameter." Note this is a probability statement about the confidence interval, not the population parameter
I fail to appreciate the subtlety: if a statement is made on the confidence interval, saying it will encompass the true value of the population parameter, is it not necessarily also a statement on the latter made? I mean, if an interval contains a real number $a$, is this not equivalent to making a statement about the number, saying the number $a$ lies in the interval?
Thanks a lot. I checked other answers on the point but failed to understand.
Your confusion is not uncommon and indeed is one of the many difficulties that that scientists have with statistics. One way to think of it is the process of flipping a coin. Say you have a fair coin, then you know the probability of a heads on the first toss is $0.50$, before you've made any such toss. Now toss the coin, and say it came up heads. What is the probability of the coin coming up heads on the first toss? Well, the first toss already happened, so a heads either appeared or it did not. You can no longer make a probability statement on the first toss, because we have a realization for that trial. Because you know the coin is fair you can say things like "If I repeated the experiment many times, the proportion of heads I would see is $0.50$." We are making a probability statement on the act of flipping coins (constructing confidence intervals) in the long run. Your example of the ball producing machine is essentially right, and yes it is confusing.
Under the frequentist interpretation, populations have true, fixed parameters, and confidence intervals are random variables which produce a range of values that cover the true parameter $a$ 95% of the time. The event of interest is "contains $a$", and if the assumptions of the process are correct, then this will happen 95% of the time in the long run. Nothing more, nothing less. $a$ is not a random variable, so it doesn't make sense to put any probability statement on $a$, we can only talk about the realizations of the confidence intervals.
It may help to read about credible intervals as well, to see the differences in interpretation. What most people want is a credible interval, so they can make easily digestible statements like "with 0.95 probability, the credible interval contains the parameter of interest".
https://en.m.wikipedia.org/wiki/Credible_interval