Explaining the form of the Gaussian measure

339 Views Asked by At

The Gaussian density $\mu(dx)=e^{-x^2/2}\ dx$ is fundamental in probability theory. Does anyone have a (non-computational) heuristic why this function should be special? (By non-computational, I mean without using combinatorial approximations and Stirling's asymptotic.)

2

There are 2 best solutions below

5
On

The Gaussian can be viewed as the "best guess" of a distribution, given that we only know that it is a distribution, and we know its mean and its variance.

For instance, suppose I have a deck of 52 cards, and I tell you to pick a card "at random". If you had no prior knowledge as to how I would choose my card, what probability of selection would you assign to any given card? I'd say $\mathbb{P}(\text{any card}) = \frac{1}{52}$ is a reasonable guess. This is an example of a "maximum entropy" distribution on the discrete set $\{1,...,52\}$. Mathematically, the solution to the optimisation problem $$\begin{cases} \text{maximise} & \left\{-\sum_{i=1}^{52} p_i \log p_i\right\} \\ \text{subject to}& \sum_{i=1}^{52} p_i = 1\end{cases}$$ is $p_i = 1/52$.

Next, suppose I tell you to pick a number "at random" from the interval $[0,1]$. Having no prior knowledge of my predispositions, you might assign equal likelihood to each number, giving a uniform distribution. Here you are solving the optimisation problem $$\begin{cases} \text{maximise} & \left\{ -\int_0^1 f(x) \ \log f(x)\ dx\right\} \\ \text{subject to} & \int_\mathbb{R} f(x)\ dx = 1 \\ & f \text{ continuous and } f \geq 0.\end{cases}$$

Now suppose I tell you to pick a number "at random" from $\mathbb{R}$. I want your selection to have a mean of $0$ and a variance of $1$. What is the distribution of the number selected? The analogous "maximum entropy" distribution is the Gaussian with density $\frac{1}{\sqrt{2\pi}}\exp(-x^2/2)$. Here, you are solving the optimisation problem $$\begin{cases} \text{maximise} & \left\{ -\int_\mathbb{R} f(x) \ \log f(x)\ dx\right\} \\ \text{subject to} & \int_\mathbb{R} f(x)\ dx = 1 \\ & \int_\mathbb{R} x \;f(x)\ dx = 0 \\ & \int_\mathbb{R} x^2 \;f(x)\ dx = 1 \\ & f \text{ continuous and } f \geq 0.\end{cases}$$

2
On

There are certain properties of Gaussian/normal distributions that make them appealing, beyond the simple stuff like the central limit theorem. For example, the Gaussian/normal has the maximum entropy for a given mean and variance. This says that the Gaussian/normal distribution provides the maximum overall "variation" in the entropy sense, given standard measures of mean and variance, which is appealing if you don't know exactly what distribution your samples follow and don't want to restrict the distribution too much.

Also, when you assume independent Gaussian/normal distribution for noise terms in something like a linear regression problem, then you can show that the maximum likelihood solution for the linear regression problem has a simple matrix formula form, basically because maximizing Gaussian/normal likelihood under the noise distribution is equivalent to minimizing sum-of-squared error after you take log of the Gaussian likelihood. Likelihood becomes a product of Gaussians which then becomes a negative sum of squares after taking log, and minimizing a sum of squares for some form of estimator is essentially always given by the "mean" in some shape or form. Perhaps this is a better answer to your question, I'm not sure.