In Elements of Information Theory, I can't figure out how the functional derivative $ \frac{\delta J}{\delta q(\hat{x}|x)} $ for $ J(q) = \sum_x \sum_{\hat{x}} p(x)q(\hat{x}|x)\log{\frac{q(\hat{x}|x)}{q(\hat{x})}} $ (from equation 10.119, ignoring the other terms of the expression which I have no problem with) leads to this result (equation 10.120, also ignoring non problematic terms): $$ \frac{\delta J}{\delta q(\hat{x}|x)} = p(x)\log{\frac{q(\hat{x}|x)}{q(\hat{x})}} + p(x) - \sum_{x'}p(x')q(\hat{x}|x')\frac{1}{q(\hat{x})} p(x) $$
In particular, I can't see where the term with the sum over x' is coming from.
This is a straightforward computation, only confusing only because of the notation with the variables $ x, \hat{x}, x' $. Spelling it out (only the last term): $$ \frac{\delta}{\delta q(\hat{x}_0|x_0)} \left( \sum_{x,\hat{x}} p(x)q(\hat{x}|x)\log{q(\hat{x})} \right) = \sum_{x,\hat{x}} p(x)q(\hat{x}|x)\frac{1}{q(\hat{x})} \frac{\delta}{\delta q(\hat{x}_0|x_0) } q(\hat{x}) \\ = \sum_{x,\hat{x},x'} p(x)q(\hat{x}|x)\frac{1}{q(\hat{x})} \frac{\delta}{\delta q(\hat{x}_0|x_0) } \left( q(\hat{x}|x') p(x')\right) \\ = \sum_{x,\hat{x},x'} p(x)q(\hat{x}|x)\frac{1}{q(\hat{x})} p(x') \; \delta_{x',x_0} \delta_{\hat{x}_0,\hat{x}}\\ = \sum_{x} p(x)q(\hat{x}_0|x)\frac{1}{q(\hat{x}_0)} p(x_0) $$ Setting $ x \rightarrow x', x_0 \rightarrow x, \hat{x}_0 \rightarrow x $ yields the correct term.