Is entropy (or relative entropy) function smooth in the sense that it is gradient is lipschitz continuous?

Question

Is entropy (or relative entropy) function smooth in the sense that it is gradient is lipschitz continuous?

952 Views Asked by Bumbble Comm At 27 Mar 2026 - 3:41

I wonder if the relative entropy function satisfies the uniform smoothness condition? Let $x \in \mathbb{R}^n_+$ be a probability distribution and $f: \mathcal{X}\rightarrow \mathbb{R}$ such that $f(x)=\sum_{i=1}^n x_i log(x_i)$. The function $f$ is called $L_f-$smooth if:

\begin{equation} f(y) \leq f(x) + \langle \nabla f(x), y-x \rangle + \frac{L_f}{2} \|y-x\|^2, \qquad \forall x,y \in \mathcal{X}, \end{equation} where $\mathcal{X} = \{x \in \mathbb{R}^n_+: \sum_i x_i = 1 \}$. I'm unable to show whether or not the (negative) entropy function satisfies this smoothness condition for some $L_f$. I have the same question for relative entropy function, i.e., KL-divergence, $KL(x,y) = \sum_{i=1}^n x_i log(\frac{x_i}{y_i})$.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2020-03-27 18:03:37

To your first question, as to whether the negative entropy $f$ satisfies the condition over $\mathcal{X}$: it doesn't.

Let's expand the RHS of the inequality. $\frac{\partial f}{\partial x_{i}}=\log x_{i}+1$ for all $i$, so that $$\langle \nabla f(x), y-x \rangle=\sum_{i=1}^{n}(y_{i}-x_{i})(\log x_{i}+1)=-f(x)+\sum_{i=1}^{n}y_{i}\log x_{i}.$$ Note we use above that $\sum_{i=1}^{n}(y_{i}-x_{i})=0$, as $y$ and $x$ are both probability distributions. The RHS is then $\sum_{i=1}^{n}y_{i}\log x_{i}+\frac{L_{f}}{2}||y-x||^{2}.$ Subtracting the sum $\sum_{i=1}^{n}y_{i}\log x_{i}$ tells us that the inequality you're hoping for is equivalent to $$KL(y, x) \leq \frac{L_{f}}{2}||y-x||^{2}$$ where $KL(y,x)=\sum_{i=1}^{n}y_{i}\log \frac{y_{i}}{x_{i}}.$ Since $x$ and $y$ are probability distributions, $||y-x||^{2} \leq (||x||+||y||)^{2} \leq 4n^{2}$, so we'd at least need $KL(y,x) \leq 2n^{2}L_{f}$ for some $L_{f}$ if this were to be true. However, it is easy to see that $KL(y,x)$ can be made arbitrarily large. For instance, with $n=2$, take $y=\left(\frac{1}{2},\frac{1}{2}\right)$ and $x=(x_{1},x_{2})$ with $x_{2}=1-x_{1}$ and $x_{1} \in [0,1].$ Then $KL(y,x) \to \infty$ as $x_{1} \to 0$ (and as $x_{1} \to 1$). Similar counterexamples arise for $n \geq 3.$

As far as KL-divergence satisfying it, I'm not sure exactly how you want to approach it. The divergence could be considered a function from $\mathcal{X} \to \mathbb{R}$ if you fix some $y \in \mathcal{X}$ and let $x \mapsto K(x,y)$ or a function from $\mathcal{X} \times \mathcal{X} \to \mathbb{R}$ where $(x,y) \mapsto K(x,y).$

Is entropy (or relative entropy) function smooth in the sense that it is gradient is lipschitz continuous?

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in CONVEX-OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions