I'm trying to understand why a T distribution with a small sample size has fatter tails and what this means. My textbook says "...t distributions have more probability in the tails and less in the center. This greater spread is due to the extra variability caused by substituting the random variable s for the fixed parameter sigma." (sample SD for population SD)
Is it always the case that a small sample will have extra variability compared to the population distribution?
A way you can think about it is that you try to estimate the normal distribution coming from a sample with uncertain variance. You do not know the true mean and the true variance, but you know that with random sample assumption you can estimate the true mean by $\overline{X}$ and true variance by $S^2/n$. Thus you will be inclined to believe that the sample is coming from a normal distribution of type $N(\overline{X},\sqrt{S^2/n})$.
The issue is "how good" is this an approximation to the original normal distribution? If you have only a small sample, then there is inherent variability in the measurement and you do not have much confidence that $N(\overline{X},\sqrt{S^2/n})$ is the real distribution. But with a large sample, you "intuitively" knew that the measurement error should somehow cancel out with each other, and you should have $\overline{X}\rightarrow \mu, \sqrt{S^2/n}\rightarrow \sigma$. This can be justified by various ways. So what you are concerned is really the sample sample case, since the large sample can be dealt with normal approximation effectively. And that is why $t$-distribution deviated from normal distribution when $n$ is small.