Derivative of Projection | Derivative of Matrix w.r.t matrix

649 Views Asked by At

I am trying to take derivative of following function w.r.t matrix $X$ where $X$ is not a square matrix hence singular.

$$ f(X) = X(X^TX)^{-1}X^T $$ I used product rule for the function with $U = X, V = (X^TX)^{-1} and W = X^T$. I am stuck at how to take derivative of matrix w.r.t to a matrix. I used the vec concept given in

http://www.iro.umontreal.ca/~pift6266/A06/refs/minka-matrix.pdf

it solved $U$ but not sure about $V and W$. Is there a better way to solve the function ?

1

There are 1 best solutions below

0
On

$$\eqalign{ P &= \{{\rm known}\} \\ X &= \{{\rm unknown}\} \\ \\ Y &= X^T \\ F &= X(X^TX)^{-1}X^T \;\doteq\; XX^+ = Y^+Y \\ F^2 &= Y^+YF = Y^+Y = F = F^T \\ FY^+ &= Y^+YY^+ = Y^+ \\ M &= (F-P) \quad\implies\quad dM = dF \\ S &= P+P^T \quad\implies\quad M+M^T = 2F-S \\ \\ \phi &= \tfrac 12\big\|M\big\|^2_F \\ &= \tfrac 12M:M \qquad\qquad\big\{{\rm Frobenius\:Product}\big\}\\ \\ d\phi &= M:dM \\ &= M:dF \\ &= M:\Big(dX\,X^+ + Y^+dY - Y^+\Big(Y\,dX+dY\,X\Big)X^+\Big) \\ &= \big(MY^+ - Y^+YMY^+\big):dX + \big(X^+M - X^+MXX^+\big):dY \\ &= \big(MY^+ -Y^+YMY^+\big):dX +\big(M^TY^+ -Y^+YM^TY^+\big):dX \\ &= \big(Y^+YSY^+ -SY^+\big):dX \\ &= \big(X^+SXX^+ -X^+S\big):dY \\ \\ \frac{\partial \phi}{\partial Y} &= (X^+SXX^+ -X^+S) \;\doteq\; 0 \\ \\ X^+S &= X^+SXX^+ \quad\implies\quad S = XX^T \\ \\ }$$ Thus any decomposition (Cholesky, LU, etc) of the symmetric matrix $S = (P+P^T)$ of the form $XX^T$ produces a serviceable solution of the zero gradient condition and minimizes the objective function $\phi$.