Maximum entropy for a power constrained distribution.

57 Views Asked by At

I am looking for proof of entropy maximization and trying to understand the part of taking the derivative. The problem is basically for finding a probability distribution $p(x)$ for a given mean $\mu$ and variance $\sigma^2$ respectively. The Lagrangian is given as

$$ J(p)=\int p(x) \ln(p(x))\mathrm{d} x-\lambda_0\left(\int p(x) \mathrm{d}x-1\right)-\lambda_1\left(\int p(x)(x-\mu)^2 \mathrm{d}x-\sigma^2\right) $$

The proof continues by taking the derivative of the expression above w.r.t $p(x)$ and obtains the following:

$$ \begin{gathered} \frac{\delta J}{\delta p(x)}=1+\ln(p(x))-\lambda_0-\lambda_1(x-\mu)^2=0 \\ \ln(p(x))=1-\lambda_0-\lambda_1(x-\mu)^2 \\ p(x)=\exp\left(-\lambda_0+1-\lambda_1(x-\mu)^2\right) \end{gathered} $$

I am trying to understand how this derivative is taken explicitly and how the integral signs are removed since $p(x)$ is a function of $x$.

1

There are 1 best solutions below

1
On

To see why we are doing that, let us consider the cost

$$J(p)=\int_\Omega F(x,p(x))dx$$ and assume that $p^*$ is the global minimum. Now, consider $J(p+h\nu)$ where $h\in\mathbb{R}$ and $\nu$ is a function. Therefore, we have

$$ J(p+h\nu)=\int_\Omega F(x,p(x)+h\nu(x))dx. $$ Assuming that the function $F$ is differentiable with respect its second argument, we get that

$$ J(p+h\nu)=J(p)+h\int_\Omega \dfrac{\partial F}{\partial p}(x,p(x))\nu(x)dx+o(h) $$

Therefore, we get that

$$ \lim_{h\to0}\dfrac{J(p+h\nu)-J(p)}{h}=\int_\Omega \dfrac{\partial F}{\partial p}(x,p(x))\nu(x)dx $$

and this value is called the directional derivative for the functional $F$, where $\nu$ is the direction, and we denote it by $DJ[\nu](\rho)$.

Now, if $J(p^*)$ is the minimum, then this means that $DJ[\nu](p^*)$ should be zero for all $\nu$'s. This is therefore equivalent to saying that $\dfrac{\partial F}{\partial p}(x,p^*(x))=0$ for all $x\in\Omega$. This is of course a necessary condition.

Now if we apply this idea to the current scenario we have that

$$F(x,p)=p\ln p-\lambda_0p-\lambda_1(x-\mu)^2$$

and the rest follows from the same lines.