I have the following problem: I want to find the probability density $p$ which maximizes the Shannon entropy \begin{equation} S := - \int_{x_b}^{x_c} dx ~ p(x) \log (p(x)) \end{equation} under the following constraints:
- normalization
- $p(x_b) = v$ for some fixed value $v$
- $p$ is continuous.
Usually, such problems can be solved using a Lagrange multiplier. My problem is: how can I impose the continuity condition in terms of Lagrange multipliers?
The optimization problem, as stated, is ill posed. Specifically, one can get a sequence of continuous functions that satisfy your pointwise constraints and get higher and higher entropy, while converging to a non-continuous function. To see this, take functions that are constant everywhere, except in an epsilon-sized ball around the measurement locations $x_b$, where they continuously transform to the measurement values, then back to the constant. These continuous functions are converting to the constant function, except for point discontinuities at the $x_b$ points.
This is reflective of a larger issue of entropy on functions, which is that entropy only cares about the values of a function, but not where those values are located in the domain spatially. If you chop up a function and move the pieces around, the entropy will remain the same.
In general, a good idea in this case is to regularize the problem with a regularization that penalizes the variation or nonsmoothness in the function. For example, Laplacian regularization will penalize how much the value of a function at a point deviates from the local average of the function, favoring functions that are smoother over functions that are less smooth.
One minimizes the modified objective function, $$ -S(p) + \frac{\alpha}{2}||Rp||^2 \\ $$ where $R$ is the regularization operator like a power of the Laplacian, and with the same constraints.