I came across following:
$$\nabla_w\lVert y-\phi w\lVert^2_2=-2\phi^\color{red}{T}(y-\phi w) $$ where $\lVert x\lVert$ is L2 norm function or euclidean distance. All $\phi,w$ and $y$ are matrices. $\phi^T$ is transpose of $\phi$
Doubt:
I know how how $\nabla_w\lambda \lVert w\lVert^2_2=2\lambda w$. As somewhat explained in this video. By similar logic, I was thinking that $\nabla_w\lVert y-\phi w\lVert^2_2=-2\phi(y-\phi w) $. But the text says $\phi^\color{red}{T}$. How that transpose appeared?
PS: I came across this while learning ridge regression.
It must be $\phi^T$ otherwise the dimensions will not match. Consider the gradient of $x^TAx$ given by $Ax +A^Tx$ which reduces to the example you gave when $A$ is the identity.