Derivative of Matrix with respect to matrix

569 Views Asked by At

Let $x \in \mathbb{R}^d,W \in \mathbb{R}^{dxd}$

$\frac {\partial{}}{\partial{W_{i,j}}}(Wx+b)$

What I have done so far is

$W_{i,j}.x_j = \begin{pmatrix} \sum_{i} W_{1,i}.x_i \\ \vdots \\ \vdots \\ \sum_{i} W_{d,i}.x_i \\ \end{pmatrix}$

Now if I take the derivative of the product mentioned above, Theoretically it should mean that all the entries of $x$ should be in the answer and the answer would be $x_i$?

2

There are 2 best solutions below

2
On BEST ANSWER

$Wx+b$ is the vector:

$$Wx+b = \begin{pmatrix} b_1+\sum_{k} W_{1,k}.x_k \\ \vdots \\ \vdots \\ b_d+\sum_{k}W_{d,k}.x_k \\ \end{pmatrix}$$

(better not to use the indices $i,j$ that are used for the derivation variable).

$W_{i,j}$ only appears on the $i$-th row of the RHS vector and $\frac {\partial{}}{\partial{W_{i,j}}}(Wx+b)$ is the vector with all entries equal to zero except the $i$-th one which is equal to $x_j$.

0
On

Using standard notation (where $\delta$ is the Kronecker Delta symbol), you have that

$$\frac{\partial}{\partial W_{ij}} W_{kh} = \delta_{ik}\delta_{jh}$$

which means that the result is $1$ if $i=k$ and $j=h$, zero otherwise. Note that all the four indexes $i,j,k,h$ appear in the final result: this is the simplest derivative of a matrix with respect to another matrix (actually what you are asking for is the derivative of a vector with respect to a matrix).

Now just apply the above rule to your case (repeated indexes are summed):

$$ \frac{\partial}{\partial W_{ij}} (W_{fs}x_s + b_f) = \frac{\partial}{\partial W_{ij}} (W_{fs}x_s) = \delta_{if}\delta_{js} x_s = \delta_{if} x_j $$ Note that the "dummy index" $s$, namely the "summation index", does not appear in the final result (as it should be).