Bregman divergence symmetric iff function is quadratic

792 Views Asked by At

The Bregman divergence of a convex function $f:\mathbb{R}^n\to\mathbb{R}$ at the point $x$ with respect to the point $y$ is defined as $$D_f(x,y) = f(x) - (f(y) + \langle \nabla f(y),x-y\rangle)$$ I'm starting to feel that $D_f(x,y)=D_f(y,x)$ for all $x,y$ iff $f$ is a quadratic. Is that true? Any answers would hopefully provide insights to why this is true.

Edit: Maybe it wasn't clear - I'm interested in a proof of the hard direction of the above assertion, i.e. that if the divergence is symmetric, $f$ must be a quadratic, and I'd very much like to see an intuitive, motivated proof.

2

There are 2 best solutions below

0
On

If $f$ is quadratic, then $D_f(x,y)$ is just the difference between $f(x)$ and the Taylor expansion of first order of $f$ at $y$. Since $f$ is quadratic, the remainer term is easily seen to be $\frac 12 (x-y)^TQ(x-y)$, where $Q=\nabla^2 f$.

Assume now that $D_f(x,y)=D(y,x)$. Then by differentiating this equation with respect to $x$, it follows $$ f'(x) - f'(y) = -f''(x)(y-x) = f''(x)(x-y). $$ Hence $f'$ is affine linear, and $f$ is quadratic. Doing this the finite difference way, the proof also shows that $f$ is twice differentiable everywhere. So twice differentiability is not an assumption but an implication of symmetry.

I got this idea from Lemma 3.16 in this paper Joint and Separate Convexity of the Bregman Distance. Bauschke, Heinz H. and Borwein, Jonathan M. (2001) Studies in Computational Mathematics, 8 . pp. 23-36.

0
On

OK here's a proof I came up with that I like.

If $f$ is quadratic, it is immediate that the Bregman divergence is symmetric. The more interesting case is when the divergence is symmetric for all pairs of vectors; it is a sort of functional equation in higher dimension.

First, without loss of generality, we can ensure the following things:

  • $\min_x f(x) = 0$ by adding a constant
  • $f$ has a critical point at $0$ (by adding the linear function $\left<\nabla f(0),x\right>$ to $f$)

Expanding and rearranging the symmetry of the divergence for general $x,y$ gives \begin{align*} 2f(x) + \left<\nabla f(y),y\right>+\left<\nabla f(x),y\right> = 2f(y) + \left<\nabla f(x),x\right>+\left<\nabla f(y),x\right> \end{align*} So now applying that symmetry with $x=0$ we get that \begin{align*} 2f(y) = \left<\nabla f(y),y\right> \end{align*} for all $y\in V$, and then applying the $2f(v) = \left<\nabla f(v),v\right>$ identity with $v$ being $x$ and $y$ in the general symmetry identity gives \begin{align*} \left<\nabla f(x),y\right> = \left<\nabla f(y),x\right> \end{align*} for all $x,y\in V$. We claim that this means the gradient map $\nabla f:V\to V^*$ is linear and hence $f$ is quadratic. We have \begin{align*} \left<\nabla f(x+y),z\right> &= \left<\nabla f(z),x+y\right> \\ &= \left<\nabla f(z),x\right>+\left<\nabla f(z),y\right> \\ &= \left<\nabla f(x),z\right>+\left<\nabla f(y),z\right> \\ &= \left<\nabla f(x)+\nabla f(y),z\right> \end{align*} and since this holds for all $z$ it must be that $\nabla f(x+y) = \nabla f(x)+\nabla f(y)$ showing additivity of the gradient map; homogeneity follows in a similar way. Thus $\nabla f$ is linear as we wanted to show.