Gradient of a parametrization of a matrix function

96 Views Asked by Bumbble Comm At 31 Mar 2026 - 11:35

I have seen several papers where it says more or less:

If $f : \mathbb{R}^{n\times n} \to \mathbb{R}$, $g(X) = f(XX^\top)$ for matrices $\mathbb{R}^{n\times k}$ (under mild conditions) the gradient of the function $g(X) = f(XX^\top)$

$$\nabla g(X) = (\nabla f(X X^\top)+\nabla f(XX^\top)) X.$$

I tried to get the previous expression using derivatives, and the definition of inner product for matrix spaces. So as first source I checked Magnus and Neudecker's fantastic book (as always) and I got a clue

$$D(XX^\top) = 2 N_n (X\otimes I_n),$$

in where $N_n$ is a symmetric idempotent matrix, $(I_{n^2}+ K_{nn})/2$ where $K_{nn} \text{vec} A = \text{vec}A^\top$.

So I innocently thought, well let's apply the chaing rule and see where I can use the properties of $\text{vec}$ to put the inner product and find the gradient. But If I blindly applied the chain rule, assuming that everything makes sense, it yields to

$$Dg(X) = 2 Df(XX^\top)N_n (X\otimes I_n).$$

So my question is, how can I connect the previous expression to get the gradient, i.e., how can I see that

$$Df(X)V = \langle \nabla g(X), V \rangle = \text{Tr}(V^\top \nabla g(X)).$$

holds? I do not see it.

I would really appreciate if you can help me out, or send me to an appropriate source to check this kind of problems.

My intuition say that is not true. But If so, what the authors mean... (btw, I've seen this in several papers so maybe I am wrong)

I tried the differential of $XX^\top$ by myself.

$$d(XX^\top) = dXX^\top + X(dX)^\top$$

So vectorizing $$d{\rm vec}(XX^\top) = ((X \otimes I) + (I \otimes X) K) d{\rm vec}(X).$$

So $DXX^\top = (X \otimes I) + (I \otimes X) K$ and by Chain rule leads to

$$Dg(X) = Df(XX^\top)((X \otimes I) + (I \otimes X) K)$$

Original Q&A

There are 1 best solutions below

Bumbble Comm On 06 Feb 2020 - 4:10 BEST ANSWER

Let $Y=XX^T$ then the differential of $f$ is $$df = \left(\frac{\partial f}{\partial Y}\right):dY$$ The function $g$ is the same function with a different parameterization, therefore $$\eqalign{ dg &= \left(\frac{\partial f}{\partial Y}\right):d(XX^T) \\ &= \left(\frac{\partial f}{\partial Y}\right):(X\,dX^T+dX\,X^T) \\ &= \left(\left(\frac{\partial f}{\partial Y}\right)^T+\left(\frac{\partial f}{\partial Y}\right)\right)X:dX \\ \frac{\partial g}{\partial X} &= \left(\left(\frac{\partial f}{\partial Y}\right)^T+\left(\frac{\partial f}{\partial Y}\right)\right)X \\ }$$ Based on this result, I think you've simply misread those papers.

NB: A colon is being used as a convenient product notation for the trace, i.e. $$A:B = \operatorname{Tr}(A^TB)$$

Gradient of a parametrization of a matrix function

There are 1 best solutions below

Related Questions in MATRICES

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in TENSOR-PRODUCTS

Related Questions in MATRIX-CALCULUS

Trending Questions

Popular # Hahtags

Popular Questions