When looking into Gaussian Mixture Models (GMMs), I encountered multiple times the statement that "GMMs are a universal approximator of densities" (e.g., [0]).
I'm not sure whether I understand this correctly, and if so, I would need a citeable source for this. The way I understand it is: Given any probability density distribution, there exists a GMM (with possibly many components) s.t. the distribution of the GMM approximates the given distribution to within arbitrary error.
My questions:
- Did I understand the above correctly?
- How is the "arbitrary error" specified?
- What source can I cite if I want to use this fact? I could use [0], but the authors also just state this claim without proving it or providing a citation. Is this folklore?
[0] Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning., p. 65
It's a vague/fancy way to say the following simple result.
But first...some definitions and facts!:
Note: This is basically why we can to empirical estimation of probability measures.
Note: Here just look at the characteristic funciton/ Fourier transforms.
Conlusion:
"Gaussian mixtures" (a.k.a. convex combinatoins of Gaussian measures) are dense in $\mathcal{P}(\mathbb{R}^d)$ For the weak$^{\star}$ topology!
Bonuses, fun, and excitment for all: Try breaking the argument and showing that linear combinations of Gaussians cannot approximate (non-trivial) point-masses for stronger topologies on $\mathcal{M}_f(\mathbb{R}^d)$. In other words....they aren't capable of approximating all probability measures for the total variation topology (another rather standard notion of convergence of measures).