Derivative of $\mathbf{XX}^T$ with respect to $\mathbf{X}$

1k Views Asked by At

Problem

$$\nabla_{\mathbf{X}}\mathbf{XX}^T$$

What I Have Done

I checked matrix cookbook, but there is no luck. So I tried to derive it from scratch. I have

$$(\mathbf{XX}^T)_{kl}= \mathbf{X}_k^T\mathbf{X}_l=\sum_{q=1}^n \mathbf{X}_{kl}\mathbf{X}_{ql}$$ where $\mathbf{X}_i$ is the $i$-th column of $\mathbf{X}$.

However, when I tried to get $\nabla_{\mathbf{X}_{ij}} (\mathbf{XX}^T)_{kl}$, I am lost in the indices $i,j,k$ and $l$ and did not how to resolve this issue.

Could anyone help me, thank you in advance.

1

There are 1 best solutions below

3
On BEST ANSWER

Set

$F(X) = XX^T; \tag 1$

we have

$F(X + H) = (X + H)(X + H)^T = (X + H)(X^T + H^T)$ $= XX^T + XH^T + HX^T + HH^T; \tag 2$

$F(X + H) - F(X) = XH^T + HX^T + HH^T; \tag 3$

$F(X + H) - F(X) - (XH^T + HX^T) = HH^T; \tag 4$

$\Vert F(X + H) - F(X) - (XH^T + HX^T ) \Vert = \Vert HH^T \Vert \le \Vert H \Vert \Vert H^T \Vert; \tag 5$

$\dfrac{\Vert F(X + H) - F(X) - (XH^T + HX^T ) \Vert}{\Vert H \Vert} \le \Vert H^T \Vert; \tag 6$

$\displaystyle \lim_{\Vert H \Vert \to 0} \dfrac{\Vert F(X + H) - F(X) - (XH^T + HX^T ) \Vert}{\Vert H \Vert} \le \lim_{\Vert H \Vert \to 0} \Vert H^T \Vert = 0; \tag 7$

this shows that

$F(X + H) = F(X) + (XH^T + HX^T) \tag 8$

to second order in $\Vert H \Vert$ as $\Vert H \Vert \to 0$, hence the linear map $H \to (XH^T + HX^T)$ is the derivative of $F(X)$:

$\nabla_X F(X)(H) = \nabla_X (XX^T) = XH^T + HX^T. \tag 9$

Nota Bene: The preceding shows how $\nabla_X (XX^T)$ may be revealed--or at least validated--directly from the definition of "first order approximation"; it also shows how $\nabla_X (XX^T)$ may in fact be found; that is, by means of the quadratic expansion (2), which isolates the first-order terms in $H$; such an expansion will generally be avialable when we have a polynomial or power series expression for $F(X)$ in terms of $X$, though for higher degree expressions the algebra may become quite complicated. Nevertheless, matrix derivatives may often be discovered via such an approach. End of Note.