I have no background on statistics, and am trying to learn the basics. In particular, I’m trying to prove the following:
Among all continuous functions on $(0,\infty)$ with mean $1$, prove that the entropy maximizer $f$ has entropy $H(f) := \int_0^\infty -f(x) \operatorname{log}(f(x)) dx$ being $1$.
Questions
What are some techniques worth paying attention to when proving entropy maximizers? Even in the easiest example over a finite set with volume $1$, I had to use the inequality $log(X) \leq X-1$, which seems like a clever trick to me, i.e. it would have taken me more than a half hour to figure out if I’ve never seen it before.
Note that the proof ask for the maximized among all continuous distributions. Can one use variational methods for this type of problems still?
If I change $log$ to other functions in the definition of entropy and ask for maximizers, are there still statistical interpretations?
Let $P(f):=\int_0^{\infty}f(x)\,dx$ and $E(f):=\int_0^{\infty}xf(x)\,dx$. Then the problem can be formally stated as follows \begin{align} \min_f &-H(f) \\ \text{s.t.}&\quad f\ge 0, \\ &\quad P(f)=1, \\ &\quad E(f)=1. \end{align} The corresponding Lagrangian (we drop the non-negativity constraint for a moment) is $$ L(f,\lambda)=-H(f)+\lambda_1(P(f)-1)+\lambda_2(E(f)-1). $$ Taking the functional derivative of $L$ w.r.t. $f$ one gets (for $x\ge 0$) $$ 1+\ln f(x) +\lambda_1+\lambda_2x =0, $$ which implies that $$ f^{*}(x)=e^{-1-\lambda_1-\lambda_2x}1_{[0,\infty)}(x). $$ Note that the optimal $f^*\ge 0$. The Lagrange multipliers $\lambda_1$ and $\lambda_2$ can be found by solving $P(f^*)=1$ and $E(f^*)=1$. Consequently, the required density is $$ f^{*}(x)=e^{-x}1_{[0,\infty)}(x) \quad\text{and}\quad H(f^{*})=\int_0^{\infty}xe^{-x}\,dx=1. $$