What's the maximum entropy for a discrete distribution with non negative support and a given mean and variance?

457 Views Asked by At

I've seen this question, but it doesn't actually give an answer in its answer, merely a text citation. I also know how to set up this problem, but the barrier is getting something to use as a solution.

If I want to find the maximum entropy distribution given the constraints above, I would set up the following function, given $\mu$ is the mean and $\sigma^2$ is the variance:

$$G = -\sum_{i=0}^{\infty}p_i\ln(p_i) + \lambda\left(\sum_{i=0}^{\infty}p_i-1\right)+\kappa\left(\sum_{i=0}^{\infty}i\cdot p_i-\mu\right)+\zeta\left(\sum_{i=0}^{\infty}i^2\cdot p_i-(\sigma^2+\mu^2)\right) $$

When I take the partial derivative with respect to $p_i$, I obtain:

$$p_i = e^{\zeta i^2+\kappa i +\lambda -1}$$

But when I start trying to do math to find the Lagrange multipliers, I end up getting formulae that simply do not produce anything workable. It's not merely that it doesn't seem to give me an analytical solution: it's that it doesn't even want to give me a transcendental one I can approximate numerically with any ease. Does there exist an analytical solution for the Lagrange multipliers? If not, what formulae can I use to give a precise numerical approximation for arbitrary $\mu$ and $\sigma$?

2

There are 2 best solutions below

0
On BEST ANSWER

You are trying to find the constants $a, b, c$ with the constraints $$\sum_n p_n = 1$$ $$\sum_n n\cdot p_n = \mu$$ $$\sum_n n^2 p_n = \sigma^2 + \mu^2$$

where the sums are over the nonnegative integers and $p_n = e^{a n^2+bn +c}$. For the sums to be convergent, it must be true that $a< 0$. From your question, note that I made the replacements $a = \zeta, b = \kappa, c = \lambda-1$.

From the first constraint, $c$ can be expressed in terms of $a$ and $b$ as $$c = -\ln\left(\sum_n e^{a n^2 + b n}\right)$$

The remaining two constraints can be expressed as $$e^c\sum_n n\cdot e^{a n^2+bn} = \mu$$ $$e^c\sum_n n^2 e^{an^2 + bn} = \sigma^2 + \mu^2$$

From here, I would suggest using gradient descent for a numerical solution. Specifically, let $G(a, b)$ be the matrix $$\left(\begin{array} \\ e^c\sum_n n e^{an^2 + bn} - \mu \\ e^c\sum_n n^2 e^{an^2 + bn} - \left(\sigma^2 + \mu^2\right) \end{array}\right)$$

The function to minimize is then $\frac{1}{2}G^T(a, b)G(a, b)$, which is equal to $$F(a, b) = \frac{1}{2}\left( \left( e^c\sum_n n e^{an^2 + bn} - \mu \right)^2 + \left( e^c\sum_n n^2 e^{an^2 + bn} - \left(\sigma^2 + \mu^2\right) \right)^2 \right)$$

Then, given $(a_n, b_n)$, generate $(a_{n+1}, b_{n+1})$ using $$(a_{n+1}, b_{n+1}) = (a_n, b_n) - \gamma_n \nabla F(a_n, b_n)$$

In this case, $\nabla F(a, b)$ will be equal to $J_G^T(a, b) G(a, b)$ where $J_G^T(a, b)$ is $$\left(\begin{array} \\ e^c\sum_n n^3 e^{an^2 + bn} & e^c\sum_n n^4 e^{an^2 + bn} \\ e^c\sum_n n^2 e^{an^2 + bn} & e^c\sum_n n^3 e^{an^2 + bn} \end{array}\right)$$

For the initial guess of $(a_0, b_0)$, I would recommend using the values in the normal distribution. That is, $a_0 = -\frac{1}{2\sigma^2}, b_0 = \frac{\mu}{\sigma^2}$. If you keep repeating the gradient descent, you will find a better numerical solution.

To approximate $\sum_n n^k e^{an^2 + bn}$, which is necessary in many places, you could just sum up to some maximal $n$. A better way however, would be to use the Euler-Maclaurin summation formula. It will be approximately equal to $$\int_0^{\infty}x^k e^{ax^2 + bx}dx - \sum_{m=1}^p \frac{B_{2m}}{(2m)!} f^{(2m-1)}(0) $$

where $f(x) = x^k e^{ax^2 + bx}$ and $p$ is a nonnegative integer. The integrals were all evaluated in "closed form" (they used the error function) in Mathematica, but the results are long for $k \ge 2$.

0
On

Too long of a comment. I'm not sure where you precisely need this, but let me give you an analogue in physics of the canonical ensemble. The entropy $S=-\rho\log \rho$ is maximized under the constraint of normalization ${\rm Tr}(\rho)=1$ and some mean value $E$ of the (internal) Energy Operator $H$. Analogous to your case this gives rise to the Lagrange function $$L=-{\rm Tr}(\rho \log \rho) + \lambda_1 ({\rm Tr}(\rho)-1) + \lambda_2({\rm Tr}(\rho H)-E)$$ which is to be maximized with respect to $\rho$. ${\rm Tr}$ stands for trace and $\rho$ is the density operator corresponding to your probabilities $p_i$. Now when this equation is varied and set to zero one obtains $$0=-\log \rho - 1 + \lambda_1+\lambda_2 H \tag{1}$$ thus $\rho=e^{\lambda_1-1+\lambda_2H}$. Since ${\rm Tr}(\rho)=1=e^{\lambda_1-1}{\rm Tr}(e^{\lambda_2 H})$ it follows $e^{1-\lambda_1}={\rm Tr}(e^{\lambda_2 H}) \equiv Z$ where $Z$ is called the partition function. Next we multiply (1) by $\rho$ and take the trace, which gives $$0=S-1+\lambda_1+\lambda_2 E=S-\log Z + \lambda_2 E \, . \tag{2}$$

Now in thermodynamics there are various different potentials for different applications. All of these are connected to each other by the so called Legendre transformation. For example, the free energy $F$ is connected to the internal energy $E$ via $F=E-TS$ which can be rearranged to $$0=S+\frac{F}{T} - \frac{E}{T} \,. $$ By termwise comparison, this gives rise to the interpretation of the free energy $F$ in terms of the partition function $$F=-T\log Z \, ,$$ but it also fixes the constant $$\lambda_2=-\frac{1}{T}$$ where $T$ is the thermodynamic temperature.

Translating to your problem, the corresponding equation (2) becomes $$0=S+\lambda-1 + \kappa\mu + \zeta(\sigma^2+\mu^2) \\ = S - \log Z + \kappa\mu + \zeta(\sigma^2+\mu^2)$$ where $Z=\sum_{i=0}^\infty e^{i(\zeta i + \kappa)}$ and $S=-\sum_{i=0}^\infty p_i \log p_i$.

Are there analogues to the thermodynamics potentials in your field of study which are connected via Legendre transformations?