Chain rule with respect to vector

121 Views Asked by At

I am tring to find this derivative:

$$ \frac{\partial}{\partial \mathbf{w}} (\frac{1}{2}(t-\mathbf{w}^T\mathbf{o})^2) $$

Where the $\mathbf{w}$, $\mathbf{o} \in \mathbb{R}^n$ are column vectors.

From the chain rule and this property $ \frac{\partial}{\partial\mathbf{x}} (\mathbf{x}^T\mathbf{b}) = \mathbf{b} $ I believe that the derivative should be:

$$ (t-\mathbf{w}^T\mathbf{o})(-\mathbf{o}) $$


In detail:

$$ f(\mathbf{w}) = \frac{1}{2} g(\mathbf{w})^2 \\ g(\mathbf{w}) = t-\mathbf{w}^T\mathbf{o} \\ \frac{\partial f}{\partial w_i} = \frac{\partial f}{\partial g} \frac{\partial g}{\partial w_i} = (t-\mathbf{w}^T\mathbf{o})(-o_i) \\ \frac{\partial f}{\partial \mathbf{w}} = (t-\mathbf{w}^T\mathbf{o})(-\mathbf{o}) $$


But then I see a problem with distributivity, because I get:

$$ \mathbf{w}^T\mathbf{o}\mathbf{o} + t\mathbf{o} $$

so for the first term there is a problem with the $(\mathbf{o}\mathbf{o})$ as I am multiplying two matrices with the shape $n \times 1$.

Therefore I assumed that this derivative is wrong. Could you please give me the correct result with step-by-step solution?

2

There are 2 best solutions below

3
On BEST ANSWER

I don't think that $\frac{\partial}{\partial w}(w^Tb) = b$ is fully correct. Rather $$\frac{\partial}{\partial w}(w^Tb) = \frac{\partial}{\partial w}(b^Tw) = b^T$$ Using this you get $$ \frac{\partial f}{\partial w} = (t - w^To)(-o^T) = -to^T + w^Too^T $$ and there's no problem with multiplying $oo^T$.

Still, it should be noted that $(w^To)o$ can have a well-defined matematical sense, even if $w^T(oo)$ doesn't; that's because in $(w^To)o$ you can identify $\mathbb M_{1\times 1}(\mathbb R)\equiv \mathbb R$, and you have a multiplication $\mathbb R \times\mathbb M_{n\times 1}(\mathbb R) \to M_{n\times 1}(\mathbb R)$ even if you don't have a multiplication $M_{1\times 1}(\mathbb R) \times M_{n\times 1}(\mathbb R) \to M_{n\times 1}(\mathbb R) $. Still again, you can make use of a tensor product and write $$(w^To)o = w^T(o \otimes o)$$ where it needs to be understood that the contraction of $w^T$ with $o\otimes o$ is at the first component.

0
On

In my answer I skip the bold style for vectors. Assuming that $\displaystyle\frac {\partial o}{\partial w } = 0$ and $t\in \mathbb R$:

$$ \frac\partial {\partial w} \left[\frac12 (t-w^{\mathsf T}o)^2\right] = \frac\partial {\partial w} \frac12\left[ t^2-2t\,w^{\mathsf T}o+(w^{\mathsf T}o)^2\right] = -to + (w^{\mathsf T}o) o = (-t + w^{\mathsf T}o) o = (t -w^{\mathsf T}o)(-o) $$

PS. In your question you wrote about problem with $oo$. I think that this is not the case since the first vector is dotted with the $w$ and produces a scalar.