Differentiate a Matrix-Transpose Product

726 Views Asked by At

if,

$$y = w^T X + b$$

where $w$ is a $[13, 1]$ matrix and $X$ is a $[13, 1]$ matrix, what is $\frac{dy}{dw}$?

$\frac{dy}{dX}$ seems to be $w^T$, but $\frac{dy}{dw}$ is not as clear to me. If the transpose wasn't there, it would seem like $\frac{dy}{dw}$ would be $X$, but I'm not sure how the transpose affects the derivative.

Any help will be greatly appreciated, thanks in advance.

3

There are 3 best solutions below

2
On

The map $g(w) = w^TX$ is linear map. And the derivative of a linear map is itself (you can try to prove it).

Therefore you have $\frac{dy}{dx}: h \mapsto h^T X$.

0
On

In index notation, your question can be dealt with in 3 lines $$\eqalign{ y_i &= w_kX_{ki} + b_i \cr dy_i &= dw_kX_{ki} \cr \frac{\partial y_i}{\partial w_p} &= \delta_{pk}\,X_{ki} = X^T_{ip} \cr }$$ with no philosophizing about how one should interpret $\,y$-vs-$y^T\,$ or $\,w$-vs-$w^T;\,$ each quantity carries a single index so talking about a transpose is meaningless. The quantity $X$ on the other hand is a $2^{nd}$ order tensor, i.e. a matrix. It carries two indices, so it is meaningful to talk about its transpose.

If you want to work in matrix notation without getting confused, then you must meticulously write equations in which all of the vectors are column vectors. Then the lines above can be translated directly and unambiguously as $$\eqalign{ y &= X^Tw + b \cr dy &= X^T\,dw \cr \frac{\partial y}{\partial w} &= X^T \cr\cr }$$ Also, your assertion that $$\frac{\partial y}{\partial X} = w^T$$ is incorrect.

A vector-by-matrix derivative generates a $3^{rd}$ order tensor. There's no way to write this in matrix notation, but in index notation it is straightforward to calculate $$\eqalign{ y_i &= w_kX_{ki} + b_i \cr dy_i &= w_k\,dX_{ki} \cr \frac{\partial y_i}{\partial X_{pq}} &= w_k \delta_{kp} \delta_{iq} = w_p \delta_{iq} \cr }$$

0
On

Since both $w$ and $X$ are column vectors, the expression $w^{T}X$ is a scalar, and as such, it equals its transpose $$w^{T}X=X^{T}w$$ Hence your function $$y(w)=w^{T}X+h\\=X^{T}w+h$$ has derivative $$\frac{\partial y}{\partial w}(w)=X^{T}$$