Computing $\frac{\partial(X^Tb)}{\partial X}$

359 Views Asked by At

In the matrix cookbook there is an identity $$\frac{\partial (a^TX^T b)}{\partial X} = ba^T$$

I recently ran into a problem where I had to compute $$\frac{\partial (X^T b)}{\partial X}$$

but I couldn't find a formula for this. However it seems that, at least for my example,

$$\frac{\partial (X^T b)}{\partial X} = b$$.

Does this formula hold in general?

Does it even make sense to take the derivative $$\frac{\partial (X^T b)}{\partial X}$$.

The problem where this came up was chapter 3.1.5 of Pattern Recognition and Machine Learning specifically taking the derivative wrt W of 3.33:

$$ln(p(T|X,W,\beta))=\frac{NK}{2}ln(\frac{\beta}{2\pi}) - \frac{\beta}{2}\sum_{n=1}^N || t_n -W^T \phi (x_n)||^2$$ where I used the chain rule to compute:

$$\frac{\partial}{\partial W}ln(p(T|X,W,\beta))=- \frac{\beta}{2}\sum_{n=1}^N \frac{\partial}{\partial W}(t_n -W^T \phi (x_n))^T(t_n -W^T \phi (x_n)) $$

$$=- \frac{\beta}{2}\sum_{n=1}^N \frac{\partial}{\partial (t_n-W^T \phi (x_n))}(t_n -W^T \phi (x_n))^T(t_n -W^T \phi (x_n))\frac{\partial}{\partial W} (t_n-W^T \phi (x_n)) $$

Then I used $$\frac{\partial (x^Tx)}{\partial x}=2x$$ and to compute the derivative

$$\frac{\partial}{\partial W} (t_n-W^T \phi (x_n))$$ I used

$$\frac{\partial (X^T b)}{\partial X} = b$$

which seems to give the correct results.

Furthermore a proof in a similar vein to this seems to work. Although I'm not sure if this is valid.

1

There are 1 best solutions below

1
On

Yes, this formula is true. Assuming $X\in\mathbb R^n$ and $b\in\mathbb R^n$, we obtain $$X^T b = \sum_{i=1}^n X_i b_i.$$

The partial derivative of this linear combination with respect to $X_k$ is $b_k$, which proves your formula:

\begin{align*} \frac{\partial(X^T b)}{\partial X} = b. \end{align*}