Simple matrix derivative

273 Views Asked by At

I always have some trouble doing matrix derivative, such as the following $$\frac{\partial(W^TX^T-Y^T)(XW-Y)}{\partial W} =\frac{\partial(W^TX^TXW - W^TX^TY - Y^TXW)}{\partial W} $$ After that, I do not know how to calculate matrix derivative when some term involves matrix transpose or matrix inverse. Can someone help me, and provide some rules regarding matrix derivatives, especially when the terms involves matrix transpose and matrix inverse? Thank you very much!

1

There are 1 best solutions below

7
On BEST ANSWER

Take $X$ and $Y$ to be constant matrices, and define $f(W)\stackrel{\text{def}}{=}(W^{\mathsf{T}}X^{\mathsf{T}}-Y^{\mathsf{T}})(XW-Y)$. Then by the distributive property of matrix multiplication and linearity of matrix transposition, $$\begin{split} f(W+h\Delta W)&=((W+h\Delta W)^{\mathsf{T}}X^{\mathsf{T}}-Y^{\mathsf{T}})(X(W+h\Delta W)-Y)\\ &=(W^{\mathsf{T}}X^{\mathsf{T}}-Y^{\mathsf{T}})(XW-Y)\\ &\quad+h(\Delta W^{\mathsf{T}}X^{\mathsf{T}}(XW-Y)+(W^{\mathsf{T}}X^{\mathsf{T}}-Y^{\mathsf{T}})X\Delta W)\\ &\quad + h^2\Delta W^{\mathsf{T}}X^{\mathsf{T}}X\Delta W \\ \therefore f(W+h\Delta W)&=f(W)+h\langle\nabla f(W),\Delta W\rangle+o(h) \end{split}$$ where the linear map $\nabla f(W)$ is defined by $$\langle\nabla f(W),\Delta W\rangle\stackrel{\text{def}}{=}\Delta W^{\mathsf{T}}X^{\mathsf{T}}(XW-Y)+(W^{\mathsf{T}}X^{\mathsf{T}}-Y^{\mathsf{T}})X\Delta W\text{.}$$

By definition, $\nabla f(W)$ is the Gâteaux derivative of $f$ at $W$—for a change $\Delta W$ in $W$, it gives the "first-order" change in $f(W)$.