Understanding derivative of matrices

61 Views Asked by At

I came across following:

$$\nabla_w\lVert y-\phi w\lVert^2_2=-2\phi^\color{red}{T}(y-\phi w) $$ where $\lVert x\lVert$ is L2 norm function or euclidean distance. All $\phi,w$ and $y$ are matrices. $\phi^T$ is transpose of $\phi$

Doubt:

I know how how $\nabla_w\lambda \lVert w\lVert^2_2=2\lambda w$. As somewhat explained in this video. By similar logic, I was thinking that $\nabla_w\lVert y-\phi w\lVert^2_2=-2\phi(y-\phi w) $. But the text says $\phi^\color{red}{T}$. How that transpose appeared?

PS: I came across this while learning ridge regression.

2

There are 2 best solutions below

2
On

It must be $\phi^T$ otherwise the dimensions will not match. Consider the gradient of $x^TAx$ given by $Ax +A^Tx$ which reduces to the example you gave when $A$ is the identity.

0
On

Note that \begin{eqnarray} \left\Arrowvert y - \phi w\right\Arrowvert^2 &=& \left<y - \phi w,y - \phi w\right>, \end{eqnarray} where, for matrices, the inner product $\left<\cdot,\cdot\right>$ is the Frobenius inner product: $\left<A,B\right> = \textsf{Tr}(A^{\textsf{T}}B)$.

Let's consider the derivative in the direction of $u$, which we assume has unit norm: $\left\Arrowvert u\right\Arrowvert = 1$. \begin{eqnarray} && \frac{\left<y - \phi (w + \epsilon u),y - \phi (w + \epsilon u)\right> - \left<y - \phi w,y - \phi w\right>}{\left\Arrowvert\epsilon u\right\Arrowvert}\\ &=& \frac{\left<y - \phi (w + \epsilon u),y - \phi (w + \epsilon u)\right> - \left<y - \phi w,y - \phi w\right>}{\epsilon}. \end{eqnarray}

The first term in the numerator can be expanded as follows. \begin{eqnarray} && \left<y - \phi (w + \epsilon u),y - \phi (w +\epsilon u)\right>\\ &=& \left<y - \phi w,y - \phi w\right> - \left<y - \phi w, \epsilon \phi u\right> - \left<\epsilon\phi u,y - \phi w\right> + \left<\epsilon\phi u,\epsilon\phi u\right>\\ &=& \left<y - \phi w,y - \phi w\right> - \epsilon\left<y - \phi w, \phi u\right> - \epsilon\left<\phi u,y - \phi w\right> +\epsilon^2\left\Arrowvert\phi\right\Arrowvert^2. \end{eqnarray}

If all of the matrices are real, then \begin{equation} \left<y - \phi w, \phi u\right> = \left<\phi u,y - \phi w\right>. \end{equation} The divided difference is thus \begin{eqnarray} -\frac{2\epsilon\left<\phi u, y - \phi w\right> + \epsilon^2}{\epsilon} &=& -2\left<\phi u, y - \phi w\right> + \epsilon\left\Arrowvert\phi\right\Arrowvert^2\\ &\to& -2\left<\phi u,y - \phi w\right>~~\textrm{as $\epsilon\to 0$}. \end{eqnarray}

This shows that the directional derivative in the direction of $u$ is \begin{eqnarray} -2\left<\phi u,y - \phi w\right> &=& -2\mathsf{Tr}\left((\phi u)^{\mathsf{T}}(y-\phi w)\right))\\ &=& -2\mathsf{Tr}\left(u^{\mathsf{T}}\phi^{\mathsf{T}}(y-\phi w)\right))\\ &=& -2\left<u,\phi^{\mathsf{T}}(y-\phi w)\right>. \end{eqnarray}

Since we take the inner product of $u$ with $-2\phi^{\mathsf{T}}(y-\phi w)$ to get the directional derivative in the direction of $u$, $-2\phi^{\mathsf{T}}(y-\phi w)$ is the derivative.