Can someone help me to see where the problem of the functional derivative below is?
Minimize the functional: $$ F[p(\hat{x}|x)] = I(X; \hat{X}) + \beta \sum_{x \in X}\sum_{\hat{x} \in \hat{X}} p(x,\hat{x})d(x,\hat{x}) $$
Subject to the constraint: $$ \sum_{\hat{x} \in \hat{X}} p(\hat{x}|x) = 1 \quad \text{for all } x \in X $$
where ( I(X; \hat{X}) ) is given by: $$ I(X; \hat{X}) = \sum_{x \in X}\sum_{\hat{x} \in \hat{X}} p(x,\hat{x}) \log\frac{p(\hat{x}|x)}{p(\hat{x})} $$
The derivative of $$I(X; \hat{X})$$ with respect to $$p(\hat{x}|x) $$(holding $x'$ and $x $constant): The first Part of the derivative of I: $$ \frac{\partial}{\partial p(\hat{x}|x)} \left( \sum_{x \in X} \sum_{\hat{x} \in \hat{X}} p(x) p(\hat{x}|x) \log p(\hat{x}|x) \right) = p(x) \left( \log p(\hat{x}|x) + 1 \right) $$
The second Part of the derivative of I: \begin{align*} \frac{\partial}{\partial p(\hat{x}|x)} \left( -\sum_{x \in X} \sum_{\hat{x} \in \hat{X}} p(\hat{x}|x) p(x) \log p(\hat{x}) \right) &= -p(x) \log p(\hat{x}) \\ &\quad - p(\hat{x}|x) p(x) \frac{1}{p(\hat{x})} \frac{\delta p(\hat{x})}{\delta p(\hat{x}|x)} \\ &= -p(x) \log p(\hat{x}) - p(\hat{x}|x) p(x) \frac{1}{p(\hat{x})} p(x) \end{align*}
The derivative of the distortion term with respect to $$ p(\hat{x}|x) $$: $$ \frac{\partial}{\partial p(\hat{x}|x)} \left( \sum_{x \in X} p(x) \sum_{\hat{x} \in \hat{X}} p(\hat{x}|x) \beta d(x,\hat{x}) \right) = \beta d(x,\hat{x})p(x) $$
The derivative of the Lagrange multiplier term for the normalization constraint with respect to $$ p(\hat{x}|x) $$: $$ \frac{\partial}{\partial p(\hat{x}|x)} \left( \sum_{x \in X} -\lambda(x) p(x) \left( \sum_{\hat{x} \in \hat{X}} p(\hat{x}|x) - 1 \right) \right) = -\lambda(x) p(x) $$
Combining these derivatives into one expression for the entire functional $$F[p(\hat{x}|x)] $$ with respect to $$ p(\hat{x}|x) $$: $$ \frac{\delta F}{\delta p(\hat{x}|x)} = p(x) \left( \log \frac{ p(\hat{x}|x)}{p{\hat{x}}} + 1 - \frac{p(x) p(\hat{x}|x) }{p(\hat{x})} + \beta d(x,\hat{x}) - \lambda(x) \right) $$
I am not getting to the correct answer (which is outlined in the article below):
$$\frac{\partial F}{\partial p(\hat{x}|x)} = p(x) \left[ \log \frac{p(\hat{x}|x)}{p(\hat{x})} + 1 - \frac{1}{p(\hat{x})} \sum_{x'} p(x')p(\hat{x}|x') + \beta d(x, \hat{x}) + \frac{\lambda(x)}{p(x)} \right] $$
I'll put primes over the dummy variables in the sum because I'm liable to get confused otherwise.
The error seems to be in computing the derivative of $\sum_{x', \hat{x}'} p(x') p(\hat{x}'|x') \log p(\hat{x}')$. Indeed, this works out to \begin{align} \partial_{p(\hat{x}|x)} \sum_{x', \hat{x}'} p(x') p(\hat{x}'|x') &\log p(\hat{x}') = p(x) \log p(\hat{x}) \\ &\qquad+ \sum_{x', \hat{x}'} p(x') \frac{p(\hat{x}'|x')}{p(\hat{x}')} \partial_{p(\hat{x}|x)} \left(\sum_{x''} p(x'') p(\hat{x}'|x'') \right), \end{align} where I've expanded $p(\hat{x}') = \sum_{x''} p(x'') p(\hat{x}'|x'')$. Now note that the final derivative is nonzero as long as $\hat{x}' = \hat{x},$ since the term $p(\hat{x}|x)$ will show up in the sum. So, the derivative is $$ \partial_{p(\hat{x}|x)} \sum_{x', \hat{x}'} p(x') p(\hat{x}'|x') \log p(\hat{x}') = p(x) \log p(\hat{x}) + \sum_{x'} p(x') \frac{p(\hat{x}|x')}{p(\hat{x})} \cdot p(x) .$$ If we now track things through, this changes the term $ -\frac{p(x)p(\hat{x}|x)}{p(\hat{x})}$ in your expression to $- \sum_{x'} \frac{p(x')p(\hat{x}|x')}{p(\hat{x}}$, which matches the one from the paper. Note that you can also simplify this: $\sum_{x'} p(x') p(\hat{x}|x') = p(\hat{x})$, so $- \sum_{x'} \frac{p(x')p(\hat{x}|x')}{p(\hat{x})} = -1$, which will cancel the $+1$ in the expression.