The thing I am talking about is the interval estimate of the population mean. Let $X\sim N(\mu,\sigma^2)$ where $\mu$ is unknown but $\sigma$ is known. To estimate $\mu$, we perform $n$ experiments and find the mean $\overline X$. On my statistics book, there is something like this:
Let $Z=\frac{\overline X-\mu}{\sigma/\sqrt n}$ so $$ P(-r<Z<r)=P(-r\sigma/\sqrt n<\overline X-\mu<r\sigma/\sqrt n)\\ =P(\overline X-r\sigma/\sqrt n<\mu<\overline X+r\sigma/\sqrt n)\\ =\alpha $$ where $\alpha$ is the given confidence level. Of course, we should choose $r=\Phi^{-1}(\frac{1+\alpha}{2})$.
The next step is what makes me worried.
Therefore, if in one experiment we get $\overline X= \bar x$, then $$P(\overline x-r\sigma/\sqrt n<\mu<\overline x+r\sigma/\sqrt n) =\alpha \text{ }(*)$$
To me, this is very strange: it's just like having a random variable $Y$ with $P(Y<3)=0.95$, and in an experiment, we observe that $Y=3.5$, so we replace $Y$ with the observed value, and write $P(3.5<3)=0.95$, which is ridiculous.
What $(*)$ is doing is to give a probability of $\mu$ lying in a certain interval - but to give $\mu$ a probability, $\mu$ must have a distribution first. So, what is the distribution of $\mu$?
If $\mu$ is not known and $\sigma$ is known, then a 95% confidence interval for $\mu$ is of the form $\bar X \pm 1.96\sigma/\sqrt{n}.$
For a normal sample, $\bar X$ is the best estimate of $\mu,$ but it is not perfect. It is subject to variability.
First because there is variability in the population, expressed by $\sigma.$ Second, because each sample is is different.
However, large samples can overcome population variability. For example, the population standard deviation $\sigma = 25$ might be a huge population variability.
Specifically, if you have $n = 100$ observations in your sample, it is possible to show that the standard deviation of $\bar X$ is decreased to $25/\sqrt{100} = 25/10 = 2.5,$ which might not be so bad.
The confidence interval extends about two standard deviations (relative to $\bar X)$ on either side of $\bar X.$ Thus, the confidence has a 'margin of error' to account for this remaining variability. (Here 'about two' turns out to be $1.96.)$
There can be no guarantee that any one 95% confidence interval (CI) truly includes the value of the population mean $\mu.$ But if you repeatedly use the formula in my first paragraph, then over the long run 95% of your CIs will include $\mu.$
Example: Suppose that, unknown to you, $\mu = 200$ and that, known to you $\sigma = 25.$ Let's make 20 CIs by taking $n - 100$ observations for this population---for each CI. (That's 2000 observations altogether, but I'm sampling by computer so it's easy to do.)
In the figure below each of the 20 vertical bars represents a CI made with the formula above. Dots are sample means. The horizontal line at $\mu=200$ makes it is easy to see which CIs contain $\mu.$ In particular, We can see that Sample 17 yielded a CI that does not include $\mu.$
Notes: (1) This is a simulation so (in the background) we know $\mu=200,$ but in real life experiments, $\mu$ wouldn't be exactly known.
(2) Here is R code used to make the figure.