How do we compute partial derivative with respect to a vector?

169 Views Asked by At

I was going through the derivation of linear regression. $$ Error = {y_i}^2 -2w^tx_iy_i + w^tx_ix_i^tw $$

where $y_i$ is a scalar, $x_i$ is a $n \times 1$ vector and $w$ is also a $n \times 1$ vector.

on the next step the partial derivatives wrt $w$ have been taken and shown to be: $$ \frac{d(Error)}{dw} = - 2y_ix_i +2x_i{x_i}^tw $$

I don't have a very good understanding of how differentiation works in the case of vectors, I know the first term is constant wrt to $w$, the second term is a scalar but has a $w^t$, how would we differentiate this $w^t$ term wrt $w$, and finally the third term has both $w$ and $w^t$, how would we go about differentiating this?

What rules of differentiation are being used here if any?

2

There are 2 best solutions below

0
On BEST ANSWER

It's a worthwhile exercise to write out the partial with respect to a single entry of $w_i$. It's the same rules of differentiation as usual, but written using vectors and matrices. In that case it will become clear why the gradient of $x^TAx$ is $2xA$ for symmetric $A$.

Also, for the second term, note that your differentiating an inner product which is just a linear term, so you should expect to get back a constant term. The last term is a quadratic, so you'd expect the derivative to be a linear term.

You could also check out the book Matrix Cookbook for some formulas.

0
On

Use these two rules for differentiation:

$$\frac{\partial \mathbf{a}^T \mathbf{b}}{\partial \mathbf{a}} = \mathbf{b},$$ and $$ \frac{\partial \mathbf{a}^T A \mathbf{a}}{\partial \mathbf{a}} = (A+A^T) \mathbf{a}.$$

In your case $\mathbf{a} = \mathbf{w}$, $\mathbf{b} = x \circ y$, where $\circ$ is the Hadamard product and $A=\mathbf{x}\mathbf{x}^T$.