Binomial Approximation of Gaussian Distribution

2.9k Views Asked by At

It is said that we may use the binomial coefficients ( a layer from Pascal's triangle) to approximate the 1-D Gaussian kernel with standard deviation $\sigma$, where $\frac{n}{4} = \sigma^2$ and $n$ is the index of the layer.

This works really nice when we want to generate a gaussian quickly. But is there a way to prove this? Or just coincidence?

And how do we decide the variance of certain layer of Pascal's triangle? Say the third layer is [1 3 3 1] (consider [1 1] as the first layer and [1 2 1] to be the second), and it is used to approximate gaussian kernel of $\sigma = \sqrt{\frac{3}{4}} = \frac{\sqrt{3}}{2}$. But how do we prove this?

1

There are 1 best solutions below

1
On BEST ANSWER

Observe that the $n$-th layer of Pascal triangle sums to $2^n$. This is not a coincidence but simply due to how the layers are recursively defined. $$\begin{array}{rrrrrrr|l} & & & & & & 1 & \sum\implies 1 \\ & & & & & 1 & 1 & \sum\implies 2 \\ & & & & 1 & 2 & 1 & \sum\implies 4 \\ & & & 1 & 3 & 3 & 1 & \sum\implies 8 \\ & & 1 & 4 & 6 & 4 & 1 & \sum\implies 16 \\ & 1 & 5 & 10 & 10 & 5 & 1 & \sum\implies 32 \\ 1 & 6 & 15 & 20 & 15 & 6 & 1 & \sum\implies 64~,~\text{and so on} \end{array}$$ When each layer is divided by its own sum, it becomes a proper distribution (probability mass function) that sum to 1. $$\begin{array}{rrrrrrr|l} & & & & & & 1 & \sum\implies 1 \\ & & & & & \frac12 & \frac12 & \sum\implies 1 \\ & & & & \frac14 & \frac12 & \frac14 & \sum\implies 1 \\ & & & \frac18 & \frac38 & \frac38 & \frac18 & \sum\implies 1 \\ & & \frac1{16} & \frac14 & \frac38 & \frac14 & \frac1{16} & \sum\implies 1 \\ & \frac1{32} & \frac5{32} & \frac{10}{32} & \frac{10}{32} & \frac5{32} & \frac1{32} & \sum\implies 1 \\ \frac1{64} & \frac6{64} & \frac{15}{64} & \frac{20}{64} & \frac{15}{64} & \frac{6}{64} & \frac1{64} & \sum\implies 1~,~\text{and so on} \end{array}$$ The histograms for $n = 2$ to $n = 7$ is shown below. For the $n$-th layer, there are $n+1$ "slots" from $x = 0$ to $x = n$. The plots are symmetric, and they have the so called bell-shape.

enter image description here

This is the special case of the discrete distribution called Binomial distribution when the parameter $p$ (which itself can mean the chance of "success") takes on the value $p = 1/2$.

This is a mathematically well-defined and well-studied function, and everything you want to know can be calculated rigorously.

Before doing some math, please imagine that when $n$ gets bigger, the discrete histogram gradually resembles a smooth curve. You can play with the applet by setting $p = 0.5$ in the top right field and adjust the field $n$ on the left. (or you can just google with keywords like "Binomial distribution interactive")

When $n$ is large "enough", one can use it to approximate a Gaussian distribution (or kernel, as called in some application). Sometimes just $n = 30$ or $n=40$ is "enough". Depending on the particular application, maybe just $n = 15$ is enough.

Mathematically, the Gaussian kernel (aka Normal distribution) can be approximated by a variety of other distributions, not just Binomial, when the "conditions are met". This is the celebrated Central Limit Theorem.

If you have tried the applet I linked above (or something similar), then you should notice that when $p \neq 1/2$ the shape of the Binomial distribution is not symmetric and not quite bell-shaped. It is true, and there's nothing wrong about it. That's just a whole other story.


Lastly, about the "proof" you want about the variance equivalence $\displaystyle\sigma^2 = \frac{n}4$.

It is the special case when $p = 1/2$ of the variance of the Binomial being $\mathbb{V}[X] = np(1-p)$.

The calculation of the variance is nothing more than some basic algebra, and it can be seen here, here, or here or as always just anything by searching the keywords "binomial distribution variance" plus "proof" or "derivation".

Admittedly, if this is the first time you're encountering the Binomial distribution (or any notion of probability distribution at all), then it takes some effort to understand the calculation.

Honestly, I don't know your background so maybe I have been emphasizing on the wrong things. Please let me know if there's anything you need me to clarify.