Maximum entropy distribution via Lagrange multiplier

306 Views Asked by At

I am following this tutorial, where we derive common probability distributions based on a constraint and the entropy equation for a discrete random variable.

I am stuck on a step: it's labeled step 8 to 9 in the post.

We do as follows.

First, we define a constraint set.

We declare our entropy equation:

$ H(x) = - \sum_{i=a}^{b} p_i\log(p_i) \tag{5} $

We then set our constraint:

$\sum_{i=a}^{b} p_i = 1 \tag{6}$

Solving for the partial derivatives of the lagrangian we are left with:

$\mathcal{L}(p_a, \ldots, p_b, \lambda) = -\sum_{i=a}^{b} p_i\log(p_i) - \lambda(\sum_{i=a}^{b} p_i - 1) \tag{7}$

and

$\frac{\partial \mathcal{L}(p_a, \ldots, p_b, \lambda)}{\partial \lambda} = 0 - \sum_{i=a}^{b} p_i + 1 = 0 \tag{9}$

It then just skips ahead and says:

$$p_i = \frac{1}{b-a+1}, \\ \lambda = \lg(b-a+1) -1 \tag{10} $$

But, where does the first equation come from? How did we take the sum at point 9 and end up at the value for $p_i$.

The sum is over the support $[a, a + 1, b - 1, b]$. I'm just not seeing how summing the probbailities over that support ultimately solve for $p_i$.

3

There are 3 best solutions below

1
On

Solving for $\frac{\partial L}{\partial p}=-log_2p-1-\lambda$ you get $p=2^{-1-\lambda}$. But the sum of the probabilities is 1 from the constraint, so $2^{-1-\lambda}n=1\implies pn=1\implies p=\frac{1}{n}$ where $n$ is the cardinality of the support. In your case $n=b-a+1$ (even though I don’t get the +1), but that’s just a specific case for your support.

You don’t care about the value of $\lambda$ as only the distribution of $p$ matters.

0
On

This is my attempt:

We know $-\log(p_i) -1 -\lambda = 0$

Therefore, $p_i = e^{-1 - \lambda}$

We also know

$-\sum_{i=a}^{b} p_i + 1 = 0$

and thus: $\sum_{i=a}^{b} p_i = 1$

We can plug in the value for $p_i$

$\sum_{i=a}^{b} e^{-1 - \lambda} = 1$

We know the sum over the support range is then just a constant equal to the length or range of the support multiplied against this substitution for $p_i$.

Reverse-substituting $p_i$ in again, we get $p_i = {{1}\over{b - a + 1}}$

The +1 is because if we have a discrete range between two points, a, b, we have b - a + 1 iterations in the sum.

0
On

As $$L(p,\lambda) = -\sum_{i=a}^{i=b}p_i\ln p_i -\lambda \left(\sum_{i=a}^{i=b}p_i-1\right)$$ we have

$$ \nabla L = \cases{-p_i-1-\lambda = 0\ \ \ \ i=a,\cdots,b\\ \sum_{i=a}^{i=b}p_i-1 = 0} $$ now taking $p_i = -(\lambda+1)$ and substituting into the restriction we have

$$ \sum_{i=a}^{i=b}(-(\lambda+1)) = 1\Rightarrow -(b-a+1)(\lambda+1)=1\Rightarrow \lambda+1 = -\frac{1}{b-a+1} $$

and finally

$$ p_i = \frac{1}{b-a+1} $$