Conditions for existence of KL divergence and unique minimum.

45 Views Asked by At

Consider two probability density function g(y) and f(y: $\theta$), $\theta \in \Theta$.

The KL divergence of f and g is defined by $$ D_{KL}(g|f) := \int \log \frac{g(y)}{f(y: \theta)} \, dy = \int -\log f(y: \theta)g(y)\, dy. $$

I want to check the following three conditions hold for some specific distributions of f and g,

(i) $ E\left[\log g(Y) \right]$ exist.

(ii) $ \left|\log f(y:\theta) \right| \le m(y) \;\; for \;\;all \; \theta \in \Theta$, where m is some integrable function.

(iii) $D_{KL}(g|f)$ has unique minimum at $\, \theta^* \in \Theta.$

, where $\;f(y), g(y)$ are continuous on $R$, $ \;\{x: f(y) > 0 \} = R, \;and\; \;\{x: g(y) > 0 \} = R.$

$\\$

I think that the condition (i) holds if the number of peaks of $g(y)$ is finite.

Here is my proof.

case(1): $g(y) \lt 1 \;\;for \;\; all \;\;y \in R.$

$\implies -\infty \lt \log g(y) \lt 0 \;\;for \;\; all \;\;y \in R.$

$\implies \left( \log g(y) \right)^{+} := max \{ \log g(y), \;0 \} = 0\;\;$ and $ \;\; \left( \log g(y) \right)^{-} := max \{ -\log g(y), \;0 \} = -\log g(y).$

$\implies E[ \left(\log g(Y) \right)^{+}] = 0 .$

$\implies E\left[\log g(Y) \right]$ exist.

case(2): There exist $n$ intervals $ I_1, ..., I_n \subset R \;\; s.t. \;\; g(y) \ge 1 \;\;\forall y\in I := I_1 \cup ...\cup I_n \;\;$ and $\;\;g(y) \lt 1 \;\;\forall y\in R-I$.

$\implies 0 \le \log g(y) \lt \infty, \;\;\; \forall y \in I \;\;\;and\;\;\; -\infty \lt \log g(y) \lt 0, \;\;\;\forall y \in R-I.$

$\implies E[ \left(\log g(Y) \right)^{+}] = \int_{I} \log g(y) \cdot g(y)dy \le sup_{y \in I}\log g(y)\cdot\int_{R} g(y)dy = sup_{y \in I}\log g(y) \lt \infty.$

$\implies E\left[\log g(Y) \right]$ exist.

Is my proof correct?

$\\$

For the condition (iii), if f(y: $\theta$) is identifiable, then $\log f(y:\theta) $ is concave in $\theta$.

Thus, $ D_{KL}(g|f) = \int -\log f(y: \theta)g(y)dy\;$ is convex in $\theta$ and the condition (iii) holds.

i.e. the condition (iii) holds if f(y: $\theta$) is identifiable.

Is it correct?

$\\$

Also, how to show the condition (ii) holds, for example, f is a normal density.