Gradient of $Z \mapsto \left( Z^T \left( \operatorname{diag}(ZZ^T\mathbf{1}) - ZZ^T\right)Z\right)$

Question

Gradient of $Z \mapsto \left( Z^T \left( \operatorname{diag}(ZZ^T\mathbf{1}) - ZZ^T\right)Z\right)$

1k Views Asked by Bumbble Comm At 25 Mar 2026 - 9:54

I know that

$$ \nabla_{Z} \operatorname{Tr} \left( Z^T A Z \right) = 2 A Z $$

and I would like to compute the gradient

$$ \nabla_{Z} \operatorname{Tr} \left( Z^T \left( \operatorname{diag}(ZZ^T\mathbf{1}) - ZZ^T\right)Z\right)$$

I would like to see how this would be done, given my basic matrix calculus skills. All the entries are real and both $A$ and $\operatorname{diag}[(ZZ^T\mathbf{1}) - ZZ^T]$ are symmetric and $Z$ is a tall, rectangular matrix (more rows than columns), while $\operatorname{diag}(\cdot)$ denotes a diagonal matrix, whose diagonal elements are specified by the placeholder $\cdot$ and I think that $\left(\operatorname{diag}(ZZ^T\mathbf{1}) - ZZ^T\right)$ is positive semidefinite (PSD), as it seems like it is in the form of a Laplacian matrix.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Let's do this. I like to use a variational approach in a case like this, since it's just sums and products. Well, there's the pesky diag term too, but we'll get to that last.

First, let's separate the additive terms: $$\mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z-Z^TZZ^TZ)$$ Now we substitute $\delta Z$ for each value of $Z$ present, creating a separate additive term each time. This is equivalent to substituting $Z\rightarrow Z+\delta Z$, subtracting any constant ($Z$ only) terms, and eliminating any higher-order terms in $\delta Z$. (This is not true in general, but when we have sums and products like this, it's fine.) The result is $$\begin{aligned} \mathop{\textrm{Tr}}( &Z^T\mathop{\textrm{diag}}(\delta ZZ^T\mathbf{1})Z +Z^T\mathop{\textrm{diag}}(Z\delta Z^T\mathbf{1})Z \\ &+\delta Z^T\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z +Z^T\mathop{\textrm{diag}}(ZZ^T\mathbf{1}) \delta Z \\ &-\delta Z^TZZ^TZ -Z^T\delta ZZ^TZ -Z^TZ\delta Z^TZ -Z^TZZ^T\delta Z~). \end{aligned}$$ Using the fact that $\mathop{\textrm{Tr}}(AB)=\mathop{\textrm{Tr}}(BA)$ (when both products are well-posed) and $\mathop{\textrm{Tr}}(AB)=\mathop{\textrm{Tr}}(B^TA^T)$, we can collect some common terms: $$\begin{aligned} &\mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(\delta ZZ^T\mathbf{1})Z) +\mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(Z\delta Z^T\mathbf{1})Z) \\ &\qquad +\mathop{\textrm{Tr}}(2\delta Z^T\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z-4\delta Z^TZZ^TZ). \end{aligned}$$ That third trace is simple: its contribution to the gradient is, by inspection, $$2\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z-4ZZ^TZ.$$ The first two terms will require a term-by term slog. To do this, we set $\delta Z=e_ie_j^T$ to get the contribution to the $(i,j)$ term of the gradient.

For the first term, let's examine $\mathop{\textrm{diag}}(\delta ZZ^T\mathbf{1})$ with $\delta Z=e_ie_j^T$: $$\mathop{\textrm{diag}}(e_ie_j^TZ^T\mathbf{1}) = \mathop{\textrm{diag}}(e_i(Z_{:j})^T\mathbf{1})=(Z_{:j}^T\mathbf{1})\cdot e_ie_i^T$$ where $Z_{:j}$ is the $j$th column of $Z$. Substituting back into the larger product, $$\begin{aligned} \mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(\delta ZZ^T\mathbf{1})Z)&= (Z_{:j}^T\mathbf{1})\mathop{\textrm{Tr}}(Z^Te_ie_i^TZ)\\ &=(Z_{:j}^T\mathbf{1})\mathop{\textrm{Tr}}(e_i^TZZ^Te_i)=\|Z_{i:}\|_2^2(Z_{:j}^T\mathbf{1}) \end{aligned}$$ where $Z_{i:}$ is the $i$th row of $Z$. This is the contribution to the $(i,j)$th element of the gradient. Can we write this in a clean form for the entire matrix? I say we can. It's a rank-one dyad! The left vector consists of the squared row norms of $Z$, which are the diagonal elements of $ZZ^T$. The right vector depends only on the column sums of $Z$. So: $$\mathop{\textrm{diag}^{*}}(ZZ^T)(\textbf{1}^TZ)=\mathop{\textrm{diag}^{*}}(ZZ^T)\textbf{1}^TZ.$$ where $\mathop{\textrm{diag}^{*}}$ really is the adjoint of the other diag operator: it extracts the diagonal of a square matrix and returns a column vector. The multiplications here are associative so I can drop those right-hand parentheses.

Now for that second term. First, we tackle the diag: $$\mathop{\textrm{diag}}(Ze_je_i^T\mathbf{1}) = \mathop{\textrm{diag}}(Z_{:j}e_i^T\mathbf{1})=\mathop{\textrm{diag}}(Z_{:j}).$$ Substituting into the larger product, $$\begin{aligned} \mathop{\textrm{Tr}}(Z^T\mathop{\textrm{diag}}(Z_{:j})Z)&= \mathop{\textrm{Tr}}(\mathop{\textrm{diag}}(Z_{:j})ZZ^T) = \sum_{k} Z_{kj}\|Z_{k:}\|_2^2. \end{aligned}$$ Well. What does this look like when we assemble it for all $(i,j)$? Well, this is a matrix multiplication, actually, where $Z$ is the right-hand matrix and the left-hand matrix has constant columns with values $\|Z_{k:}\|_2^2$ in each column. So how about this: $$\left(\mathbf{1}(\mathop{\textrm{diag}^*}(ZZ^T))^T\right)Z=\mathbf{1}(\mathop{\textrm{diag}^*}(ZZ^T))^TZ.$$ The symmetry with the first term offers some confirmation.

So assembling my subproblems, this is what I have: $$\boxed{\mathop{\textrm{diag}^{*}}(ZZ^T)\textbf{1}^TZ+\textbf{1}(\mathop{\textrm{diag}^{*}}(ZZ^T))^TZ+2\mathop{\textrm{diag}}(ZZ^T\mathbf{1})Z-4ZZ^TZ.}$$ I like the symmetry of this, and the presence of $Z$ on the right-hand side of each term. By George, I think we've got it.

But I'm sleepy. I'll check this in the morning, edit if I have to, delete if it's totally messed up.

ADDED: the quantity inside the original trace is not positive semidefinite, by the way, as the poster has guessed. Just try it with some random matrices, you'll see it doesn't work.

Gradient of $Z \mapsto \left( Z^T \left( \operatorname{diag}(ZZ^T\mathbf{1}) - ZZ^T\right)Z\right)$

There are 1 best solutions below

Related Questions in MATRICES

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in MATRIX-CALCULUS

Related Questions in SCALAR-FIELDS

Trending Questions

Popular # Hahtags

Popular Questions