Why does Bregman's function require strict convexity?

512 Views Asked by At

I am looking over the definition of Bregman's function but I couldn't get over the fact that it requires strict convexity when the function itself is based on the first order approximation of any convex function

For instance, on these slides: http://www.cs.utexas.edu/users/inderjit/Talks/bregtut.pdf

Let $\phi:S\to\mathbb{R}$ be differentiable, strictly convex of Legendre type, then the Bregman's divergence is defined as:

$$D_\phi(x,y) = \phi(x) - \phi(y) - (x-y)^T\nabla \phi(y)$$

And later on it says, note $D_\phi(x)$ is strictly convex in first argument but no the second (why not? not obvious).

I thought perhaps you could relax the definition, so I went on Wikipedia, and there was the same thing:

https://en.wikipedia.org/wiki/Bregman_divergence

Let $F$ be a diffable...strictly convex function

Why strictly convex? Can we relax it to just convex?

1

There are 1 best solutions below

0
On BEST ANSWER

According to the wiki article, the Bergman function is a quantity that serves as a "metric-like" function, which satisfies the standard metric properties except for the triangle inequality (similar to the case of the the Kullback-Leibler distance function).

Now, the metric property $D_\phi(x,y)=0 \text{ if and only if } x=y$, is equivalent to $$ \phi(x)=\phi(y) + (x-y)^T \nabla\phi(y) \text{ if and only if } y=x, $$ which can only hold if $\phi(\cdot)$ is strictly convex. (Recall that $\phi(\cdot)$ is strictly convex if and only if $$\phi(x)>\phi(y) + (x-y)^T \nabla\phi(y) \text{ for all } y\neq x,$$ whereas in the case of non-strict convexity, "$>$" is replaced by "$\geq$".)

Regarding the per-argument convexity of $D_\phi$, although $D_\phi$ is clearly convex w.r.t. $x$, since it is of the form $\text{const} + \phi(x) - x^T \nabla\phi(y)$, i.e., the sum of convex functions, I don't see why $D_\phi(\cdot)$ should be convex w.r.t $y$ in general.