Gradient of $Z \mapsto \left( Z^T \left( \operatorname{diag}(ZZ^T\mathbf{1}) - ZZ^T\right)Z\right)$

1k Views Asked by At

I know that

$$ \nabla_{Z} \operatorname{Tr} \left( Z^T A Z \right) = 2 A Z $$

and I would like to compute the gradient

$$ \nabla_{Z} \operatorname{Tr} \left( Z^T \left( \operatorname{diag}(ZZ^T\mathbf{1}) - ZZ^T\right)Z\right)$$

I would like to see how this would be done, given my basic matrix calculus skills. All the entries are real and both $A$ and $\operatorname{diag}[(ZZ^T\mathbf{1}) - ZZ^T]$ are symmetric and $Z$ is a tall, rectangular matrix (more rows than columns), while $\operatorname{diag}(\cdot)$ denotes a diagonal matrix, whose diagonal elements are specified by the placeholder $\cdot$ and I think that $\left(\operatorname{diag}(ZZ^T\mathbf{1}) - ZZ^T\right)$ is positive semidefinite (PSD), as it seems like it is in the form of a Laplacian matrix.

1

There are 1 best solutions below

1
On BEST ANSWER

Let's do this. I like to use a variational approach in a case like this, since it's just sums and products. Well, there's the pesky diag term too, but we'll get to that last.

First, let's separate the additive terms: $$\mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z-Z^TZZ^TZ)$$ Now we substitute $\delta Z$ for each value of $Z$ present, creating a separate additive term each time. This is equivalent to substituting $Z\rightarrow Z+\delta Z$, subtracting any constant ($Z$ only) terms, and eliminating any higher-order terms in $\delta Z$. (This is not true in general, but when we have sums and products like this, it's fine.) The result is $$\begin{aligned} \mathop{\textrm{Tr}}( &Z^T\mathop{\textrm{diag}}(\delta ZZ^T\mathbf{1})Z +Z^T\mathop{\textrm{diag}}(Z\delta Z^T\mathbf{1})Z \\ &+\delta Z^T\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z +Z^T\mathop{\textrm{diag}}(ZZ^T\mathbf{1}) \delta Z \\ &-\delta Z^TZZ^TZ -Z^T\delta ZZ^TZ -Z^TZ\delta Z^TZ -Z^TZZ^T\delta Z~). \end{aligned}$$ Using the fact that $\mathop{\textrm{Tr}}(AB)=\mathop{\textrm{Tr}}(BA)$ (when both products are well-posed) and $\mathop{\textrm{Tr}}(AB)=\mathop{\textrm{Tr}}(B^TA^T)$, we can collect some common terms: $$\begin{aligned} &\mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(\delta ZZ^T\mathbf{1})Z) +\mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(Z\delta Z^T\mathbf{1})Z) \\ &\qquad +\mathop{\textrm{Tr}}(2\delta Z^T\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z-4\delta Z^TZZ^TZ). \end{aligned}$$ That third trace is simple: its contribution to the gradient is, by inspection, $$2\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z-4ZZ^TZ.$$ The first two terms will require a term-by term slog. To do this, we set $\delta Z=e_ie_j^T$ to get the contribution to the $(i,j)$ term of the gradient.

For the first term, let's examine $\mathop{\textrm{diag}}(\delta ZZ^T\mathbf{1})$ with $\delta Z=e_ie_j^T$: $$\mathop{\textrm{diag}}(e_ie_j^TZ^T\mathbf{1}) = \mathop{\textrm{diag}}(e_i(Z_{:j})^T\mathbf{1})=(Z_{:j}^T\mathbf{1})\cdot e_ie_i^T$$ where $Z_{:j}$ is the $j$th column of $Z$. Substituting back into the larger product, $$\begin{aligned} \mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(\delta ZZ^T\mathbf{1})Z)&= (Z_{:j}^T\mathbf{1})\mathop{\textrm{Tr}}(Z^Te_ie_i^TZ)\\ &=(Z_{:j}^T\mathbf{1})\mathop{\textrm{Tr}}(e_i^TZZ^Te_i)=\|Z_{i:}\|_2^2(Z_{:j}^T\mathbf{1}) \end{aligned}$$ where $Z_{i:}$ is the $i$th row of $Z$. This is the contribution to the $(i,j)$th element of the gradient. Can we write this in a clean form for the entire matrix? I say we can. It's a rank-one dyad! The left vector consists of the squared row norms of $Z$, which are the diagonal elements of $ZZ^T$. The right vector depends only on the column sums of $Z$. So: $$\mathop{\textrm{diag}^{*}}(ZZ^T)(\textbf{1}^TZ)=\mathop{\textrm{diag}^{*}}(ZZ^T)\textbf{1}^TZ.$$ where $\mathop{\textrm{diag}^{*}}$ really is the adjoint of the other diag operator: it extracts the diagonal of a square matrix and returns a column vector. The multiplications here are associative so I can drop those right-hand parentheses.

Now for that second term. First, we tackle the diag: $$\mathop{\textrm{diag}}(Ze_je_i^T\mathbf{1}) = \mathop{\textrm{diag}}(Z_{:j}e_i^T\mathbf{1})=\mathop{\textrm{diag}}(Z_{:j}).$$ Substituting into the larger product, $$\begin{aligned} \mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(Z_{:j})Z)&= \mathop{\textrm{Tr}}(\mathop{\textrm{diag}}(Z_{:j})ZZ^T) = \sum_{k} Z_{kj}\|Z_{k:}\|_2^2. \end{aligned}$$ Well. What does this look like when we assemble it for all $(i,j)$? Well, this is a matrix multiplication, actually, where $Z$ is the right-hand matrix and the left-hand matrix has constant columns with values $\|Z_{k:}\|_2^2$ in each column. So how about this: $$\left(\mathbf{1}(\mathop{\textrm{diag}^*}(ZZ^T))^T\right)Z=\mathbf{1}(\mathop{\textrm{diag}^*}(ZZ^T))^TZ.$$ The symmetry with the first term offers some confirmation.

So assembling my subproblems, this is what I have: $$\boxed{\mathop{\textrm{diag}^{*}}(ZZ^T)\textbf{1}^TZ+\textbf{1}(\mathop{\textrm{diag}^{*}}(ZZ^T))^TZ+2\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z-4ZZ^TZ.}$$ I like the symmetry of this, and the presence of $Z$ on the right-hand side of each term. By George, I think we've got it.

But I'm sleepy. I'll check this in the morning, edit if I have to, delete if it's totally messed up.

ADDED: the quantity inside the original trace is not positive semidefinite, by the way, as the poster has guessed. Just try it with some random matrices, you'll see it doesn't work.