How does the CLT justify statistical models which are not modeling our data as a sum of random variables?

46 Views Asked by At

Wikipedia says (emphasis mine):

The Central Limit Theorem states "that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions.

Now the CLT says something about the sum of independent random variables:

$$ X = X_1 + X_2 + \dots X_n $$

where $X_i$ is the $i$th draw of a random variable. But the statistical models I am familiar with do not model the sum of random variables; they model the random variable directly.

For example, factor analysis models a random variable $\textbf{x}$ as

$$ \textbf{x} \sim \mathcal{N}(\textbf{0}, \Lambda \Lambda^{\top} + \Psi) $$

(It does not matter what $\Lambda$ and $\Psi$ are; only that $\textbf{x}$ is modeled as Gaussian.) And this modeling assumption is justified using the CLT (see A Unifying Review of Linear Gaussian Models, footnote on page 2). But aren't we modeling $\textbf{x}$, not the sum of $\textbf{x}$?

In summary: How does the CLT justify statistical models which are not modeling our data as a sum of random variables?