I am following a seminar on computing MaxEnt distributions and I am bit confused with the differences between the general (analytical) template and the actual computational procedures followed by optimization packages in R or Python.
The general template is to maximixe the entropy function :
$H(x) = - \int p(x)\ln p(x)dx$
subject to a set of constraints on the moments or other functions of distribution.
In an analytical setting, this requires no other information than the values assigned to the constraints and solving via langrange optimization.
In the setting with only the normalization constraint this is :
$J(p)=\int_{a}^{b} p(x)\ln p(x)dx-\lambda_{0}\left(\int_{a}^{b} p(x)dx-1\right)$
.. which gives us the general solution in terms of $\lambda_{o}$
$p(x)=e^{1 -\lambda_{0}}$
Why do computational procedures such as the "augmented lagrangian" require a "prior distribution" or "inital" set of values in order to compute the optimum ?
See :
https://www.rdocumentation.org/packages/nloptr/versions/1.2.1/topics/auglag
Thank you!
I would think that this is because the algorithms need to start ``searching'' from somewhere. They hypothesise a solution and then try to improve on it. This would explain the initial set of values. Not sure about the prior distribution (unless you mean to specify an initial $p(\cdot)$ for the algorithm to begin its search.