On the functional form of Kullback-Leibler information

Question

On the functional form of Kullback-Leibler information

57 Views Asked by Bumbble Comm At 07 Apr 2026 - 11:15

I'm studying about information criteria and came across with the following proof in my reference material (page 34):

On the functional form of Kullback-Leibler information

If the differentiable function $F$ defined on $(0, \infty)$ satisfies the relationship

$$\sum_{i=1}^k g_iF(f_i) \leq \sum_{i=1}^k g_iF(g_i)$$

for any two probability functions $\{g_1, ..., g_k\}$ and $\{f_1, ..., f_k\}$, then $F(g)=\alpha + \beta\log g$ for some $\alpha, \beta$ with $\beta>0$.

Proof: In order to demonstrate that $F (g) = \alpha + \beta \log g$, it suffices to show that $gF'(g)=\beta>0$ and hence that $\partial F / \partial g =\beta/g$

Let $h = (h_1, ..., h_k)^T$ be an arbitrary vector that satisfies $\sum_{i=1}^k h_i=0$ and $|h_i| \leq \max\{g_i , 1 − g_i\}$. Since $g+\lambda h$ is a probability distribution, it follows from the assumption that

$$\phi(\lambda)\equiv \sum_{i=1}^k g_iF(g_i+\lambda h_i) \leq \sum_{i=1}^k g_iF(g_i)=\phi(0).$$

Therefore, since

$$\phi'(\lambda)= \sum_{i=1}^k g_iF'(g_i+\lambda h_i)h_i,\;\;\;\;{\color{red}{\phi'(0)= \sum_{i=1}^k g_iF'(g_i)h_i=0}}$$

are always true, by writing $h_1=C, h_2=-C, h_i=0\;(i=3, ..., k)$, we have

$$g_1F'(g_1)=g_2F'(g_2)=\text{const}=\beta.$$

The equality for other values of $i$ can be shown in a similar manner Q.E.D.

QUESTION 1: What happened starting from the equation I marked as red? Why is this equation equal to zero?

QUESTION 2: It was not given in my material but I assume the values $\lambda$ and $C$ are constants?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

In the red equation it was used the fact, written above, that for all $\lambda$ (at least the ones for which the function is well defined) $$ \phi(\lambda) \leq \phi(0)$$ This means that $\phi$ attains maximum at $0$ and therefore its derivative at $0$ must be $0$.

To answer the second question: no, $\lambda$ is not a constant. Once the vector $h$ is fixed, we define the function $$\lambda\mapsto \sum_{i=1}^k g_iF(g_i+\lambda h_i)$$ Observe that, since we required $h$ to satisfy the properties above, for all $\lambda$ sufficiently small $(g_i+\lambda h_i)$ is a probability function (at least for $\vert\lambda\vert<1$, otherwise there is the risk that for some $i$ $g_i+\lambda h_i<0$, and in that case we cannot evaluate it in $F$). So we can consider $\lambda$ to be a variable in $(-1,1)$.

Now all of this reasoning was done for a fixed suitable vector $h$ (i.e. a different choice of $h$ would determine a different function $\phi$). Now $C$ is a constant (which again has to be sufficiently small for everything to be well defined) that we choose, and that determines $h$ of the form $(C,-C,0,\ldots,0)$. What they do then is to plug in the equation $$\sum_{i=1}^k g_iF'(g_i)h_i =0$$ the vector $h=(C,-C,0,\ldots,0)$ which gives $$Cg_1 F'(g_1) - Cg_2 F'(g_2)=0$$ and the conclusion follows. I hope this clarifies it to you.

On the functional form of Kullback-Leibler information

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in INFORMATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions