How to get the derivative of this matrix function where the variables occur 2 times?

122 Views Asked by At

$$ \left\|Y-XX^T \right\|_{\text{F}}^2$$

where $X,Y$ are matrices. Taking derivative w.r.t $X$ yields

$$-2(Y-XX^T)X$$

Why is this so?

1

There are 1 best solutions below

0
On

Some notations:

  • Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
  • Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= A C^T : B \\ &= A^T: C^TB^T \\ &= {\text{etc.}} \cr \end{align}

Let $f := \left\|Y - XX^T\right\|_F^2 \equiv Y - XX^T:Y - XX^T$.

Obtain the differential followed by the gradient (aka Jacobian). \begin{align} df &= d\left(Y - XX^T:Y - XX^T \right) \\ &= \left[\left(-dXX^T - XdX^T \right):Y - XX^T \right] + \left[Y - XX^T : \left(-dXX^T - XdX^T \right)\right] \\ &= -2 \left(Y - XX^T\right) : \left(dXX^T + XdX^T \right) \\ &= \left[-2 \left(Y - XX^T\right) : dXX^T \right] + \left[-2 \left(Y - XX^T\right) : XdX^T \right]\\ &= \left[-2 \left(Y - XX^T\right)X : dX \right] + \left[-2 X^T\left(Y - XX^T\right) : dX^T \right]\\ &= \left[-2 \left(Y - XX^T\right)X : dX \right] + \left[-2 \left(Y^T - XX^T\right)X : dX \right]\\ \end{align}

Thus, the gradient is \begin{align} \frac{\partial f}{\partial X} = -2\left[\left(Y + Y^T \right) - 2XX^T\right]X. \end{align}