Derivative with respect to a matrix in matrix multiplication

1.1k Views Asked by At

I have this very simple problem, but I cannot seem to understand how this can be computed. I need to find the derivative with respect to a matrix that is part of matrix multiplication: $$A_{(m,n)}*W_{(n,p)} = C_{(m,p)} \\ \frac{dC}{dW} = ?$$ I would need a matrix with the same dimensions as $W$ so $n$ by $p$, but whatever resources I find just confuse me more.. Is it not possible to express this in terms of $A$, the constant matrix?

2

There are 2 best solutions below

4
On BEST ANSWER

Let's first recall how the derivative of a function $f\colon\mathbb{R}^a \rightarrow\mathbb{R}^b$ is characterized (provided it exists). Given a point $x\in\mathbb{R}^a$, the derivative of $f$ at $x$ is the unique linear map $df_x\colon\mathbb{R}^a\rightarrow \mathbb{R}^b$ (which may be represented by a matrix, which you know as Jacobian) such that $$ \Vert f(x+h)-f(x)-df_x(h) \Vert=o(\Vert h\Vert) $$ Now translate it to your question: Let $F\colon M(n\times p)\rightarrow M(m\times p)$ be the map between real matrices defined by $F(W)=A \cdot W$, where $A\in M(m\times n)$ is a fixed matrix. Then given a "point" $X\in M(n\times p)$, the derivative of $F$ at $X$ is the unique linear map $dF_X\colon M(n\times p)\rightarrow M(m\times p)$ with $$ \Vert F(X+H)-F(X)-dF_X(H) \Vert=o(\Vert H\Vert) $$ But $F(X+H)-F(X)=A\cdot(X + H) - A \cdot X= A \cdot H$, which is already a linear function in $H$. This shows that $dF_X(H)=A \cdot H$, independent of the choice of $X$.


EDIT: In order to connect this with derivative of a function in one variable, consider the case $a=b=1$, then you would define $$ f'(x):=\lim_{h\rightarrow 0}\frac{f(x+h)-f(x)}{h} $$ Equivalently $$ \lim_{h\rightarrow 0}\frac{f(x+h)-f(x)-f'(x)h}{h}=0 $$ or $$ {\vert f(x+h)-f(x)-f'(x)h \vert} = o(\vert h\vert ) $$ in the Landau notation. Thus in this case $df_x$ is the linear map with $df_x(h)=f'(x)h$. Now for the matrix case the expression $$ \lim_{H\rightarrow 0} H^{-1}( F(X+H)-F(X)) $$ only makes sense for square matrices and invertible $H$. That is why in general one uses the definition $$ \Vert F(X+H)-F(X)-dF_X(H) \Vert=o(\Vert H\Vert), $$ which as demonstrated above is equivalent to the usual definition in the one variable case. Without referring to the $o()$ notation you could also say that $dF_X$ is the unique linear map such that $$ \lim_{\Vert H \Vert\rightarrow 0 }\frac{\Vert F(X+H)-F(X)-dF_X(H) \Vert}{ \Vert H\Vert } = 0. $$

1
On

If $C(W) = AW$, then $C(W+H) = C(W) + AH$, so the derivative is $DC(W)(H) = AH$.