I am looking at finding literature to be able to minimize the following variational problem: Minimize,
$$ \mathcal{F}\left[p(y|x)\right] = I(X;Y) + \beta \ \mathbb{E}_{p(x,y)}\left[ d(x,y) \right] $$
over all normalized distributions $ p(y|x). $ $ I(X;Y) $ is the mutual information, $ d(x,y) $ the distortion function, and $ \beta $ a Lagrange multiplier. (However the fact that this is a information theoretic problem is not important).
I am familiar with Sobolev spaces, functional analysis variational theory. However, I do not understand how I can take the functional derivative with respect to all normalized distributions.
Happy for any help.
Br, Carl