Is alpha divergence a convex divergence measure?

848 Views Asked by At

Alpha divergence is defined as following : $$ D_\alpha(p||q) = \frac{1}{\alpha (1-\alpha)} \left( 1- \int _x p(x)^{\alpha} q(x)^{(1-\alpha)} dx \right) $$

if the distributions are restricted to the exponential family, is the divergence $D_\alpha(p||q)$, convex in the natural parameters of q and p? Let's assume that the distributions are defined as following : $$ p(x;\eta_1) \propto \exp(\eta_1^\top g(x)) $$ $$ q(x;\eta_2) \propto \exp(\eta_2^\top g(x)) $$ and $g(x)$ are sufficient statistics of the exponential family.

PS. I am not sure of the alpha-divergence is convex with respect to any family of distributions. That's why I restricted myself to the exponential family. So, it might be convex in general, irrespective of the exponential families.

2

There are 2 best solutions below

1
On

If we denote the Renyi divergence of order $\alpha$ by $$C_\alpha=\frac{1}{\alpha-1}\log \int p^{\alpha} q^{1-\alpha}dx$$ then

i) $C_{\alpha}$ is convex in $q$ for all $\alpha >0$,

ii) $C_{\alpha}$ is convex in $(p,q)$ for $0<\alpha<1$, and

iii) $C_{\alpha}$ is neither convex nor concave in $p$ for $\alpha >1$. Refer this.

Since $$D_{\alpha}=\frac{1}{\alpha(1-\alpha)}(1-e^{(\alpha-1)C_{\alpha}}),$$ all that immediately clear is that $D_{\alpha}$ is convex in $q$ for $\alpha>1$.

But in this article appeared in Entropy journal, it is claimed that the $D_{\alpha}$ you are talking about is convex in both $p$ and $q$ for all $\alpha$. For the proof they direct to Amari and Nagaoka's book on Methods of Information Geometry where this $D_{\alpha}$ is not given in this form rather in a symmetrized (in $\alpha$) form viz. $$D^{(\alpha)}(p\|q)=\frac{4}{1-\alpha^2}\left(1-\int p^{\frac{1-\alpha}{2}}q^{\frac{1+\alpha}{2}}dx\right)$$

So it is not very clear to me whether $D_{\alpha}$ is convex in both $p$ and $q$ for all $\alpha$.

0
On

We can rewrite the divergence as following:

$$ D_\alpha(p||q) = \frac{1}{\alpha(1-\alpha)} \left( 1- \exp\left\lbrace \log Z(\alpha \eta_1 +(1-\alpha) \eta_2 ) - \alpha \log Z(\eta_1 ) - (1-\alpha)\log Z( \eta_2 ) \right\rbrace \right) $$ where $ \log Z(\eta) = \log \int_x \exp(\eta_1^\top g(x)) dx $ is the convex log-partition function.

  • When $\alpha \geq 1 $ : $ \log Z(\alpha \eta_1 +(1-\alpha) \eta_2 ) - (1-\alpha)\log Z( \eta_2 )$ is a convex function of $\eta_2$, and thus the whole divergence is convex with respect to $\eta_2$.

  • When $\alpha \leq 0 $ : $ \log Z(\alpha \eta_1 +(1-\alpha) \eta_2 ) - (\alpha)\log Z( \eta_1 )$ is a convex function of $\eta_2$, and thus the whole divergence is convex with respect to $\eta_1$.

  • In all other cases it is not clear whether it is convex or not.

Note that this is in consistent with the property of the the alpha divergence that $D_\alpha(p||q) = D_{1-\alpha}(q||p) $.

In general the alpha divergence is convex the the set of whole proper distributions, but since the set of proper exponential family of distributions is not convex (not average of any two exponential distributions lies in the exponential family), this theorem doesn't seem very useful in establishing convexity with respect to the parameters of general exponential forms.

Answer credit : Chris Maddison and Tom Minka (and partly me!).