I am an aspiring probabilist, and I definitely know the central limit theorem. However, I am trying to understand what idea it really embodies.
I am aware that normal distribution arises as the limit even when the averages are taken over non-independent random variables. But in some sense, it seems the correlations among the $X_i$'s are weak. Is this intuition correct? In other words, when can we expect a central limit theorem?
Is there some deeper notion embodied in this limit law? It is so universal, it seems it is quite mysterious...
Here is an answer to "what's so special about the normal distribution?"
The limiting distribution is normal because the family of normal distributions is closed under addition (and averaging): if $X$ and $Y$ are normal, so are $X+Y$ and $\frac{X+Y}{2}$, with the appropriately modified means and variances.
So if we started with a bunch of i.i.d. normal random variables, their average would stay normal. This tells us that if any sort of central limit theorem holds, it ought to give the normal distribution in the limit.
In general, whenever a limit theorem holds, it gives a distribution in the limit which is closed under the operation we care about.
There are other limit theorems for other cases. For example, the Poisson distribution is a distribution on $\mathbb N$ with a similar property: if $X$ and $Y$ are Poisson, so is $X+Y$. We have Le Cam's theorem and Janson's inequality as two instances of the claim that if we sum many approximately-independent Bernoulli random variables, then in cases where the mean of the sum is constant, it has an approximately Poisson distribution.
(This doesn't contradict the central limit theorem, because the Poisson distribution with mean $\lambda$ is well-approximated by the normal distribution with mean and variance $\lambda$ when $\lambda$ is large.)
For another example, the exponential distribution is closed under minimums and scaling: if $X$ and $Y$ are exponential, so are $k \cdot X$ and $\min\{X,Y\}$. We can prove various forms of a "minimum limit theorem": if $X_1,X_2, \dots, X_n$ are sufficiently independent nonnegative random variables with probability density functions that have nonzero and reasonably similar values at $0$, then then $n \cdot \min\{X_1, X_2, \dots, X_n\}$ should be approximately exponential.