I am interested in the following constrained optimization problem:
$$\text{Minimize }J_p(F, G) = \int_0^1 \left(\frac{F}{f} + \frac{1-G}{g}\right)^{-1} dx.$$
over a pair of cumulative distributions functions $F$ and $G$, with $f := F'$ and $g := G'$, subject to the following constraints:
- Their Jensen Shannon divergence is given by a fixed constant, that is, $$\text{JSD}(F, G) := -\frac{1}{2}\int_0^1 f \log \frac{2f}{f+g} + g \log \frac{2g}{f+g} dx = C.$$
- The Radon-Nikodym derivative is non-decreasing $$\left(\frac{g}{f}\right)' \ge 0.$$
- $F, G$ are indeed cumulative distribution functions $$f, g \geq 0; \qquad 0 \le F, G \le 1.$$
This seems like a pretty standard calculus of variation problem, and I indeed made some progress deriving its associated Euler Lagrange equation, which is second order due to the inequality constraint 2.
First I let $H = 1-G$, then the objective function becomes $J_p(F, H) = \int_0^1 \left(\frac{F}{f} - \frac{H}{h}\right)^{-1} dx$. The constraints also change accordingly:
- $\frac{1}{2} \int_0^1 \left(f \log \frac{2f}{f-h} - h \log\frac{2h}{h-f} \right) dx =C$.
- $\left(\frac{h}{f}\right)' \le 0$.
- $f, -h \ge 0; \qquad 0 \le F, -H \le 1$.
One can then take the functional derivative (with respect to $F$ and $H$ respectively) of the Lagrangian $$ \mathcal{L} := \int_0^1 [\left(\frac{F(x)}{f(x)} - \frac{H(x)}{h(x)}\right)^{-1} + \frac{\lambda_1}{2} \left(f(x) \log \frac{2f(x)}{f(x)-h(x)} - h(x) \log\frac{2h(x)}{h(x)-f(x)} \right) - C] dx \\ - \frac{\lambda_2(x) (h'(x) f(x) - h(x)f'(x))}{f(x)^2} + \lambda_3(x)f(x) - \lambda_4(x) h(x) + \lambda_5(x) F(x) + \lambda_6(x) (1 - F(x)) + \lambda_7(x) H(x) + \lambda_8(x) (1 - H(x)).$$
Let $\mathcal{D}_F := \frac{\partial}{\partial F} - \frac{d}{dx} \frac{\partial}{\partial f} + \frac{d^2}{dx^2} \frac{\partial}{\partial f'}$ be the Euler Lagrange operator for $F$ and similarly define $\mathcal{D}_H$.
[Update: below computation is flawed; see my self-answer below, which relies on mathematica more exclusively, without any hand derivation. The result makes a lot of sense when plotted]
I was able to simplify the resulting system of 2 ODEs significantly, thanks to mathematica, as follows:
$$ \mathcal{D}_F \mathcal{L} = [(f+h) B - A] U + [(h' \lambda_2)' + (h \lambda_2)''] + \lambda_3' + \lambda_5' - \lambda_6' = 0 \\ \mathcal{D}_H \mathcal{L} = [(f-h) B + A] V - [(f' \lambda_2)' + (f \lambda_2)''] - \lambda_4' + \lambda_7' - \lambda_8' = 0,$$
where $A := \lambda_1 (fh' - hf')(Hf - Fh)^3$, $B := 4fh[(Hf - Fh)fh + FH(fh' - hf')]$, $U, V$ are some rational functions of $F, H, f, h$.
Here I took the liberty of rewriting constraint 2 as $h' f - h f' \le 0$, since the denominator is assumed to be nonnegative; I am not completely sure if this is correct when $f(x) = 0$, but including the denominator doesn't seem to make much difference.
Now with the complementary slackness condition on constraint 2, we know that whenever $h'f - hf' \neq 0$, $\lambda_2$ has to be $0$, and vice versa. Similarly, $f \lambda_3 \equiv 0 \equiv h \lambda_4$.
One trivial critical solution of the original problem is $F = G$, which corresponds to $h'f - hf' \equiv 0$. This gives the global maximum, and satisfies only the JSD constraint with $C = 0$. I am interested in the nontrivial minimum solutions.
Now here is my trouble. As soon as I look at a point $x \in [0, 1]$ where $f, h, F, H, (1-F), (1-H) \neq 0$ (which implies $\lambda_i' = 0$, for $3 \le i \le 8$, if we assume they are $C^1$ at $x$), and $h'f - h f' \neq 0$, I am forced to have $Hf - Fh = 0$, which in turn forces me to have $h'f - hf' = 0$! So somehow I cannot escape from the condition that leads to the trivial solution. Where did I mess up? Does this problem require digging deeper into solutions with isolated points of non-differentiability?


Some notes
The problem requires the correct using of constrains. If there were no integrals both the goal function and conditions in Lagrangian, this would be an ordinary problem for the conditional extremum and the task function obtained would correspond to the requirement of simultaneous observance of contradictory conditions of the form $$F(x) = 0,\quad F(x) = 1$$ and similar with $G(x).$
It is known that the greatest value of a function is achieved either at one of its stationary points or at the boundary of the region. The same approach should be used in this case.
Thus, the constraints of the problem should not be taken into account in the search for a global extremum, but each of them must have a separate "boundary" problem for the extremum. And, probably, we should start with the task of a global extremum.
Simplified statement of the problem
Simplified statement of the problem is nesessary for mathematical tools checking.
Let us maximize $J_p(F, G)$ on condition $JSD(F, G) = C,\,$ or $$\int\limits_0^1\left((f+g)\log (f+g) - f\log 2f - g\log 2g\right)\,dx = 2C.$$
That gives the Lagrangian in the form of
$\mathcal L(F, G) = \int_0^1 {fg\over Fg + (1-G)f}\,dx + \lambda\left(\int\limits_0^1\left((f+g)\log (f+g) - f\log 2f - g\log 2g\right)\,dx - 2C\right).$
What does mathematica give in this case?
And is it possible to set the constraints of the task as limitations, and not as mandatory conditions?
Getting of equations
To get the equations, let us equal the variations ${\delta \mathcal L(F, G) \over \delta F}, {\delta \mathcal L(F, G) \over \delta G}$ to zero: $$\begin{cases} {-fg^2\over \left(Fg + (1-G)f\right)^2} + {d\over dx}{g\left(Fg + (1-G)f\right) - fg(1-G)\over\left(Fg + (1-G)f\right)^2} + \lambda{d\over dx}\left(\log(f+g)+1-\log2f - 1\right) = 0\\ {f^2g\over \left(Fg + (1-G)f\right)^2} + {d\over dx}{f\left(Fg + (1-G)f\right) - fgF\over\left(Fg + (1-G)f\right)^2} + \lambda{d\over dx}\left(\log(f+g)+1-\log2g - 1\right) = 0, \end{cases}$$ $$\begin{cases} {-fg^2\over \left(Fg + (1-G)f\right)^2} + {d\over dx}{Fg^2\over\left(Fg + (1-G)f\right)^2} + \lambda{d\over dx}\left(\log(f+g)-\log2f\right) = 0\\ {f^2g\over \left(Fg + (1-G)f\right)^2} + {d\over dx}{(1-G)f^2\over\left(Fg + (1-G)f\right)^2} + \lambda{d\over dx}\left(\log(f+g)-\log2g\right) = 0, \end{cases}$$ $$\begin{cases} {-fg^2\over \left(Fg + (1-G)f\right)^2} + {fg^2+2Fgg'\over\left(Fg + (1-G)f\right)^2} - {2Fg^2\left(fg+Fg'-gf+(1-G)f'\right)\over\left(Fg + (1-G)f\right)^3} + \lambda\left({f'+ g'\over f+g}-{f'\over f}\right) = 0\\ {f^2g\over \left(Fg + (1-G)f\right)^2} + {-gf^2+2(1-G)ff'\over\left(Fg + (1-G)f\right)^2} - {2(1-G)f^2\left(fg + Fg' - gf + (1-G)f'\right)\over\left(Fg + (1-G)f\right)^3} + \lambda\left({f'+ g'\over f+g}-{g'\over g}\right) = 0, \end{cases}$$ $$\begin{cases} {2Fgg'\left(Fg + (1-G)f\right)\over\left(Fg + (1-G)f\right)^3} - {2Fg^2\left(Fg'+(1-G)f'\right)\over\left(Fg + (1-G)f\right)^3} + \lambda\left({f'+ g'\over f+g}-{f'\over f}\right) = 0\\ {2(1-G)ff'\left(Fg + (1-G)f\right)\over\left(Fg + (1-G)f\right)^3} - {2(1-G)f^2\left(Fg' + (1-G)f'\right)\over\left(Fg + (1-G)f\right)^3} + \lambda\left({f'+ g'\over f+g}-{g'\over g}\right) = 0, \end{cases}$$ $$\begin{cases} {2Fgg'(1-G)f-2Fg^2(1-G)f'\over\left(Fg + (1-G)f\right)^3} + \lambda{(f'+g')f-f'(f+g)\over (f+g)f} = 0\\ {2(1-G)ff'Fg - 2(1-G)f^2Fg'\over\left(Fg + (1-G)f\right)^3} + \lambda{(f'+g')g - g'(f+g)\over(f+g)g} = 0, \end{cases}$$ $$\begin{cases} {2F(1-G)g(fg'-f'g)\over\left(Fg + (1-G)f\right)^3} + \lambda{fg'-f'g\over (f+g)f} = 0\\ {2F(1-G)f(f'g-fg')\over\left(Fg + (1-G)f\right)^3} + \lambda{f'g-fg'\over (f+g)g} = 0. \end{cases}$$ Easy to see that the equations are proportional. This means that the system is overdefined and that the parameter $\lambda$ is arbitrary for the simplified task.
On the other hand, the task has, at least, the next solutions $$ F(x) = 0,\quad G(x) - \text{arbitrary};\\ G(x) = 1,\quad F(x) - \text{arbitrary};\\ F(x) = C_1,\quad G(x) = C_2;\\ F(x) = kG(x)+b. $$