As far as I understand it, when estimating the population mean from a sample without knowing the population standard deviation $\sigma$, we cant use the $Z$-test. According to the Central Limit Theorem, the sampling distribution of sample means is a normal distribution with mean${}=\mu$ and variance${} = \sigma / \sqrt n$. But the sample standard deviation $s = \frac {\sum(x_i - \mu)^2}{n} $ underestimates the true population parameter $\sigma$ (i.e. $s < \sigma$). For that reason, we cannot apply Central Limit Theorem using the sample standard deviation directly.
But we use Bessel's correction precisely for this reason! Ww write $s = \frac {\sum(x_i - \mu)^2}{n-1} $ so that now $s ≈ \sigma$. My question is, after applying Bessel Correction, why cant we use Z test directly to estimate population mean $\mu$ ?
The $T$-distribution is a bit flatter than the $Z$-distribution. This essentially says the fact that the population variance is a bit more than the variance we estimated from the sample. But did we not already take that into consideration by applying Bessel correction?
Now another question arises in this context. From the Central Limit Theorem, the sampling distribution is essentially a Normal distribution and nothing else. Never any other distribution, and certainly NOT a $T$-distribution. Just because we may fail to estimate the variance does not mean you should change the sampling distribution of sample means from a $Z$-distribution to a $T$-distribution. If instead of the $Z$-distribution, you would have taken some other Normal distribution with variance "a bit more than 1", at least that would have made sense. A $T$-Distribution looks like a normal distribution, but is NOT a $Z$-distribution with variance increased by some amount. Just because there may be some uncertainty in determination of the $\sigma$, why are you assuming that, some entirely other distribution should better approximate the distribution of sample means?
Note: I have already looked into the following answers and they do not satisfactorily answer my question
Estimating population SD when calculating t-statistic
When the population variance is unknown, we should use t-distribution.
Attention to some details is needed here.
We have $s^2= \sum(x_i-\overline x)^2/(n-1).$
We know that
$$ \frac{\overline x - \mu}{\sigma/\sqrt n} \sim \operatorname N(0,1). \tag 1 $$
But $$ \frac{\overline x - \mu}{s/\sqrt n} \sim t_{n-1}. \tag 2 $$
The latter is used for deriving confidence intervals and hypothesis tests on $\mu$ because we cannot observe $\sigma,$ whereas we can observe $s.$
We need a pivotal random variable in which the only unobservable quantity is $\mu.$ $\text{“}$Pivotal$\text{''}$ means its probability distribution does not depend on unobservables, and that the only unobservable taken into account in finding the value of the pivotal quantity is the one for which we want a confidence interval or a hypothesis test.