normal approximation to a uniform distribution

3.3k Views Asked by At

Earlier today I was talking to a machine learning researcher about how well a normal distribution could approximate a uniform distribution over an interval $[a,b] \subset \mathbb{R}$. Although the following analysis involves nothing fancy, I think it's useful as it can be easily generalised to higher dimensions(i.e. multivariate uniform distributions).

I managed to make some progress and then I got stuck so I look forward to some constructive criticism.

  1. I define the problem in terms of the KL-Divergence:

\begin{equation} \mathcal{D}_{KL}(P|Q) = -\int_{-\infty}^\infty p(x) \ln \frac{p(x)}{q(x)}dx \end{equation}

where $P$ is the target uniform distribution and $Q$ is the approximating Gaussian:

\begin{equation} p(x)= \frac{1}{b-a} \mathbb{1}_{[b-a]} \implies p(x \notin [b-a]) = 0 \end{equation}

and

\begin{equation} q(x)= \frac{1}{\sqrt{2 \pi \sigma^2}} e^{\frac{(x-\mu)^2}{2 \sigma^2}} \end{equation}

Now, given that $ \lim_{x \to 0} x\ln(x) = 0$ if we assume that $(a,b)$ is fixed our loss may be expressed in terms of $\mu$ and $\sigma$:

$ \begin{equation} \begin{split} \mathcal{L}(\mu,\sigma) & = -\int_{a}^b p(x) \ln \frac{p(x)}{q(x)}dx \\ & = \ln(b-a) - \frac{1}{2}\ln(2\pi\sigma^2)-\frac{\frac{1}{3}(b^3-a^3)-\mu(b^2-a^2)+\mu^2(b-a)}{2\sigma^2(b-a)} \end{split} \end{equation} $

  1. Minimising with respect to $\mu$ and $\sigma$:

We can easily show that the mean and variance of the Gaussian which minimises $\mathcal{L}(\mu,\sigma)$ correspond to the mean and variance of a uniform distribution over $[a,b]$:

\begin{equation} \frac{\partial}{\partial \mu} \mathcal{L}(\mu,\sigma) = \frac{(b+a)}{2\sigma^2} - \frac{2\mu}{2\sigma^2}= 0 \implies \mu = \frac{a+b}{2} \end{equation}

\begin{equation} \frac{\partial}{\partial \sigma} \mathcal{L}(\mu,\sigma) = -\frac{1}{\sigma}+\frac{\frac{1}{3}(b^2+a^2+ab)-\frac{1}{4}(b+a)^2}{\sigma^3} =0 \implies \sigma^2 = \frac{(b-a)^2}{12} \end{equation}

  1. What I find strange:

When I enter the optimal Gaussian parameters $\mu$ and $\sigma$ I obtain a constant loss. To be precise I obtain:

\begin{equation} \mathcal{L} = -\frac{1}{2}(\ln \big(\frac{\pi}{6}\big)+1) \approx -.17 \end{equation}

I find this really counter-intuitive yet I've gone through my calculations several times and haven't found any error. How can this constant loss be justified? Might there be a geometric way of explaining this result?

Intuitively, I expected that for small intervals(i.e. $b-a\approx 0$) the optimal loss should be much larger and for larger intervals the optimal loss should be close to zero.

1

There are 1 best solutions below

7
On BEST ANSWER

As the commentor stated, your divergence results make sense in that if $X\sim\mathcal{N}(\mu,\sigma^2)$ and $Y\sim\mathcal{U}(a,b)$, then the optimally 'fit' Gaussian satisfies $\mathbb{E}[X]=\mathbb{E}[Y]$ and $\mathbb{V}[X]=\mathbb{V}[Y]$. In other words, given a uniform distribution, the 'best' Gaussian is chosen by putting its mean in the middle of $a$ and $b$, and stretching $\sigma$ to match the variance of the uniform.

The fact that the loss is constant is interesting though. Note that each distribution has only two degrees of freedom (i.e. two parameters). My intuition is that it makes more sense to view it from the other perspective. Suppose we are given $\mathcal{N}(\mu,\sigma^2)$ and fit $\mathcal{U}(a,b)$ to it. Note that the fit parameters are: $ a = \mu - \sqrt{3}\sigma $ and $ b = \mu +\sqrt{3}\sigma $.

Clearly, as $\mu$ changes, we are just shifting/translating around where the distribution is, so changing $\mu$ cannot affect our goodness of optimal fit. So that aspect makes sense.

Intuitively, as $\sigma$ increases, so too should $b-a$ increase to match it. However, in my mind, if $\sigma\rightarrow 0$, the $a\rightarrow b$, and both density functions should converge to being a Dirac delta distribution, not having a constant difference.

Note: computing $\mathcal{D}_\text{KL}(Q|P)$ gives rise to serious problems, though it seemed promising. (Perhaps if one could add small decaying tails to the uniform with some parameter or the like (i.e. to "soften" it), some insight could be gleaned.)