Why the $\alpha-$divergence is defined in the following way (Information Geometry)

37 Views Asked by At

Let me define everything first
Let $S$ be a manifold and suppose that we are given a smooth function $D=D(.||.):S\times S\to\mathbb{R}$ satisfying for any $p,q\in S$ $$D()p||q)\geq 0\hspace{0.5cm} \text{and}D(p||q)=0 \iff p=q$$

Now for a convex function $f(u)$ on $u>0$ for each probability distribution $p,q$ lets define $$D_{f}(p||q)=\int p(x)f(\frac{q(x)}{p(x)})dx$$ we call it the f-divergence.

Let us now define $\alpha-$divergence as $D^{(\alpha)}=D_{f^{(\alpha)}}$ for a real number $\alpha$ we define \begin{equation} f^{(\alpha)}(u) = \begin{cases} \frac{4}{1-\alpha^2}\{1-u^{(1+\alpha)/2}\} & (\alpha\neq \pm1)\\ u\hspace{0.5mm}logu & (\alpha=1)\\ -logu & (\alpha=-1) \end{cases} \end{equation}

My question is why we have defined the $f^{(\alpha)}$ is the way it is defined I mean why we have the factor $\frac{4}{1-\alpha^2}$ what if I remove it and then define the function what problem will I face.

1

There are 1 best solutions below

0
On BEST ANSWER

For $u > 0$ we have

$$\lim_{\alpha \to -1} \frac{4}{1-\alpha^2} (1 + u^{(1+\alpha)/2}) = -\log u,$$

so the factor makes $f^{(\alpha)}(u)$ continuous in $\alpha$ for $\alpha < 1$. Hence, its arguably the most natural choice.