I am studying the Bregman Matrix Divergence of symmetric matrices, https://web.stanford.edu/group/mmds/slides/dhillon-mmds.pdf
which defined as
$D_\psi (X,Y) = \psi(X)-\psi(Y) - \text{tr}((\nabla\psi(Y))^\top(X-Y))$,
where $\text{tr}(X)$ is the trace of $X$. It seems that commonly used $\psi$ include $\psi(X) = \frac{1}{2}\text{tr}(X^\top X)$, $\psi(X) = \text{tr}(X\log X-X)$, and $\psi(X) = -\log \det (X)$.
I was wondering if I can use the squared trace norm $\psi(X) = ||X||_*^2 = (\text{tr}(\sqrt{X^\top X}))^2$, but I have never seen any reference.
In addition, if I consider symmetric positive definite matrices, then I have $\psi(X) = (\text{tr}(X))^2$, and therefore
$D_\psi (X,Y) = (\text{tr}(X))^2 - (\text{tr}(Y))^2 - 2\text{tr}(Y) \cdot \text{tr}(X-Y) = (\text{tr}(X)-\text{tr}(Y))^2$,
which is really weird to me, because as far as I know, $D_\psi (X,Y)=0$, iff $X=Y$. But here we only require $\text{tr}(X)=\text{tr}(Y)$.
Is there anything wrong with my derivation or could anyone tell me some related references?
Thanks a lot!
This is because the $\Psi $ function is not strictly convex.