I understand the assumptions underpinning and the motivation behind error distributions which led to the development of Laplace's and eventually Gauss's error distributions as explained in Saul Stahl's essay. It makes sense that larger random errors are less likely than smaller ones, and that the magnitudes of errors are the same regardless of direction, i.e. $\phi(x) = \phi(-x)$.
What I'm having a hard time reconciling is the widespread use of the normal distribution to describe generic data, not just errors. Why can we model human heights or test grades with a normal distribution?
I understand The Central Limit Theorem allows us to use the normal distribution to approximate certain sampling distributions, e.g. $\overline{x}$, as normal (usually when $n \geq 30$) - but why? What does an error curve have to do with CLT? Do random factors that affect natural phenomena to cancel each other out leading to (approximately) normal distributions with large samples?
Are you clear what the "Central Limit Theorem" says? There doesn't have to be any "inherent" "normally distributed errors". The Central Limit Theorem say that given any probability distribution with a finite mean, $\mu$ and standard deviation, $\sigma$, then the sum of n samples will be approximately normally distribution with mean $n\mu$ and standard deviation $\sqrt{n}\sigma$ while the average of n samples is approximately distributed with mean $\mu$ and standard distribution $\frac{\sigma}{\sqrt{n}}$.
The larger n is, the better the approximation is. Generally n greater than or equal to 30 is considered enough to make the approximation "good".