My notes on confidence give this question:
An investigator is interested in the amount of time internet users spend watching TV a week. He assumes $\sigma = 3.5$ hours and samples $n=50$ users and takes the sample mean to estimate the population mean $\mu$
Since $n=50$ is large we know that $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}$ approximates the Standard Normal. So, with probability $\alpha = 0.99$, the maximum error of estimate is $E = z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \approx 1.27$ hours.
The investigator collects that data and obtain $\bar{X}=11.5$ hours. Can he still assert with 99% probability that the error is at most 1.27 hours?
With the answer that:
No he cannot, because the probability describes the method/estimator, not the result. We say that "we conclude with 99% confidence that the error does not exceed 1.27 hours."
I am confused. What is this difference between probability and confidence? Is it related to confidence intervals? Is there an intuitive explanation for the difference?
Suppose you have a random sample $X_1, X_2, \dots X_n$ from $Norm(\mu, \sigma)$ with $\sigma$ known and $\mu$ to be estimated. Then $\bar X \sim Norm(\mu, \sigma/\sqrt{n})$ and we have $$P\left(-1.96 \le \frac{\bar X - \mu}{\sigma/\sqrt{n}} \le 1.96\right) = 0.95.$$ After some elementary manipulation, this becomes $$P(\bar X - 1.96\sigma/\sqrt{n} \le \mu \le \bar X + 1.96\sigma/\sqrt{n}) = 0.95.$$ According to the frequentist interpretation of probability, the two displayed equations mean the same thing: Over the long run, the event inside parentheses will be true 95% of the time. This interpretation holds as long as $\bar X$ is viewed as a random variable based on a random sample of size $n$ from the normal population specified at the start. Notice that the second equation needs to be interpreted as meaning that the random interval $\bar X \pm 1.96\sigma/\sqrt{n}$ happens to include the unknown mean $\mu.$
However, when we have a particular sample and the numerical value of an observed mean $\bar X,$ the frequentist "long run" approach to probability is in potential conflict with a naive interpretation of the interval. In this particular case $\bar X$ is a fixed observed number and $\mu$ is a fixed unknown number. Either $\mu$ lies in the interval or it doesn't. There is no "probability" about it. The process by which the interval is derived leads to coverage in 95% of cases over the long run. As shorthand for the previous part of this paragraph, it is customary to use the word confidence instead of probability.
There is really no difference between the two words. It is just that the proper frequentist use of the word probability becomes awkward, and people have decided to use confidence instead.
In a Bayesian approach to estimation, one establishes a probability framework for the experiment at hand from the start by choosing a "prior distribution." Then a Bayesian probability interval (sometimes called a credible interval) is based on a melding of the prior distribution and the data. A difficulty Bayesian statisticians may have in helping nonstatisticians understand their interval estimates is to explain the origin and influence of the prior distribution.