According to [1], $D[P\colon Q]$ is called a divergence if it satisfies the following criteria:
- $D[P:Q] \geqslant 0$
- $D[P:Q] =0 \iff P=Q$
- When $P$ and $Q$ are sufficiently close, by denoting their coordinates by $\xi_p$ and $\xi_q=\xi_p + d\xi$, the Taylor expansion of $D$ is written as: $$ D[\mathbf{\xi}_p\colon\xi_p+\xi] = \dfrac{1}{2}\sum g_{ij}(\xi_p)d\xi_id\xi_j + \mathcal{O}\left(\lvert\xi\lvert^3\right), $$ where $g_{ij}$ denotes the $(i,j)$-th element of a positive definite matrix $G$ that depends on $\xi_p$.
I am trying to prove that the so-called Kullback-Leibler divergence, which is given as $$ D_{KL}[p(x)\colon q(x)] = \int p(x)\log{\left(\frac{p(x)}{q(x)}\right)\mathrm{d}x} $$ is indeed a divergence, based on the above definition, but I have been stuck at the $3^{rd}$ criterion.
Could you please give me some insight on how I should proceed?
[1] Amari, S. I. (2016). Information geometry and its applications (Vol. 194). Japan: Springer