If $y = W^T x$, what is $\frac{\partial y}{\partial W}$?

231 Views Asked by At

I would like to derive the derivative of a vector by matrix, i.e. $y = W^Tx$, where $W$ is a matrix, $x,y$ are vectors. What is $\frac{\partial y}{\partial W} = \frac{\partial W^T x}{\partial W}$?


Follow-up: Define another function $z = a^T y = a^T W^Tx$, so that $z$ is a scalar. We know that $\frac{\partial z}{\partial W}$ is a matrix with the same of $W$. At the same time, by chain rule

\begin{equation} \frac{\partial z}{\partial W} = \frac{\partial z}{\partial y} \cdot \frac{\partial y}{\partial W} \end{equation} where $\frac{\partial z}{\partial y}$ is a $1\times N$ vector ($N$ is the dimension of $y$). So it seems the dimensions of matrices in left and right hand side don't match. Any explanations?

3

There are 3 best solutions below

3
On

Using indices we have $y_i =\sum_j (W^T)_{ij} x_j=\sum_j W_{ji}x_j$. So using kronecker delta: $$ \frac{\partial y_i}{\partial W_{kl}} = \sum_j \delta_{ji,kl} x_j = \delta_{il} x_k.$$

For your second part with $z= a^Ty=a^T W^T x=\sum_{kl} a_l W_{kl} x_k$:

$$ a_l x_k = \frac{\partial z}{\partial W_{kl}} = \sum_i \frac{\partial z}{\partial y_i} \frac{\partial y_i}{\partial W_{kl}} = \sum_i a_i \delta_{il} x_k = a_l x_k .$$

Hope it helps.

0
On

Let $f : \mathbb R^{m \times n} \to \mathbb R^n$ be defined by

$$f (\mathrm X) := \mathrm X^T \mathrm a$$

The $i$-th entry of $f$ is the scalar

$$f_i (\mathrm X) = \mathrm e_i^T \mathrm X^T \mathrm a = \mathrm a^T \mathrm X \mathrm e_i = \mbox{tr} (\mathrm e_i \mathrm a^T \mathrm X) = \mbox{tr} ((\mathrm a \mathrm e_i^T)^T \mathrm X) = \langle \mathrm a \mathrm e_i^T, \mathrm X \rangle$$

Hence, the derivative of $f_i$ with respect to $\mathrm X$ is the $m \times n$ matrix

$$\partial_{\mathrm X} f_i (\mathrm X) = \mathrm a \mathrm e_i^T$$

0
On

Consider the more general problem in which $(a,x,y)$ are replaced by matrices $(A,X,Y)$ $$\eqalign{ Y & = W^TX \cr z &= A:Y = XA^T:W \cr }$$ where the colon denotes the Matrix Inner Product (aka the Frobenius Product).

The gradient of $z$ can be evaluated quite simply from its differential $$ dz=XA^T:dW\quad\implies\quad\frac{\partial z}{\partial W}=XA^T $$ It can also be evaluated via the chain rule, but it is much more complicated $$\eqalign{ dz &= A:dY \cr \frac{\partial z}{\partial Y} &= A \cr\cr dY &= dW^TX &= {\mathbb E}X^T:{\mathbb F}:dW \cr \frac{\partial Y}{\partial W} &= {\mathbb E}X^T:{\mathbb F} \cr }$$ Taking the inner product of these 2 gradients (note that the second one is a 4th order tensor) yields $$\eqalign{ \frac{\partial z}{\partial W} &= \frac{\partial z}{\partial Y} : \frac{\partial Y}{\partial W} \cr &= A:{\mathbb E}X^T:{\mathbb F} \cr &= AX^T:{\mathbb F} \cr &= XA^T \cr }$$ The 4th order isotropic tensors have components $$\eqalign{ {\mathbb E}_{ijkl} &= \delta_{ik}\,\delta_{jl} \cr {\mathbb F}_{ijkl} &= \delta_{il}\,\delta_{jk} \cr }$$ and have the interesting properties when multiplied by a matrix $M$, using the Frobenius product $$\eqalign{ {\mathbb E}:M = M:{\mathbb E} &= M \cr {\mathbb F}:M = M:{\mathbb F} &= M^T \cr }$$ Having derived a solution for the full matrix case, the result trivially holds when switching back to the vectors $(a,x,y),\,$ since these are merely rectangular matrices.