gradient of KL-Divergence

2.9k Views Asked by At

Let $X$ be a finite set and $A$ be a set of probability distributions.

Then KL-Divergence between two probability distributions $P(x)$ and $Q(x)$ $\in$ A is $$D(P(x)\vert \vert Q(x))=\sum P(x)\operatorname{log}\left(\frac{P(x)}{Q(x)}\right)$$ for all x$\in$ X.

Since KL-divergence is the distance between two probability distributions, therefore for fixed $Q(x)$ we can talk about its derivative with respect to $P(x).$

Can anyone tell what will be its gradient w.r.t $P(x)$?

1

There are 1 best solutions below

4
On

$\newcommand{\dd}{\mathrm{d}}\newcommand{\R}{{\rm I\!R}}$Based on the formula you are using for the KL divergence, I'm assuming $X$ is a discrete space - say $X = \{1, 2, \ldots, n\}$. I will also assume that $\log$ denotes the natural logarithm ($\ln$).

For fixed $q$, the KL divergence (as a function of $p$) is a function $D_{\rm KL}(p \parallel q): \R^n \to \R$. We have $$ \frac{\dd}{\dd p_i}D_{\rm KL}(p \parallel q) {}={} \frac{\dd}{\dd p_i}\sum_{i=1}^{n}p_i\ln\frac{p_i}{q_i} {}={} \ln\frac{p_i}{q_i} + 1, $$ therefore, $\nabla_{p}D_{\rm KL}(p \parallel q) \in \R^n$ and its $i$-th element is $$ (\nabla_{p}D_{\rm KL}(p \parallel q))_i = \ln\frac{p_i}{q_i} + 1. $$