I am tring to find this derivative:
$$ \frac{\partial}{\partial \mathbf{w}} (\frac{1}{2}(t-\mathbf{w}^T\mathbf{o})^2) $$
Where the $\mathbf{w}$, $\mathbf{o} \in \mathbb{R}^n$ are column vectors.
From the chain rule and this property $ \frac{\partial}{\partial\mathbf{x}} (\mathbf{x}^T\mathbf{b}) = \mathbf{b} $ I believe that the derivative should be:
$$ (t-\mathbf{w}^T\mathbf{o})(-\mathbf{o}) $$
In detail:
$$ f(\mathbf{w}) = \frac{1}{2} g(\mathbf{w})^2 \\ g(\mathbf{w}) = t-\mathbf{w}^T\mathbf{o} \\ \frac{\partial f}{\partial w_i} = \frac{\partial f}{\partial g} \frac{\partial g}{\partial w_i} = (t-\mathbf{w}^T\mathbf{o})(-o_i) \\ \frac{\partial f}{\partial \mathbf{w}} = (t-\mathbf{w}^T\mathbf{o})(-\mathbf{o}) $$
But then I see a problem with distributivity, because I get:
$$ \mathbf{w}^T\mathbf{o}\mathbf{o} + t\mathbf{o} $$
so for the first term there is a problem with the $(\mathbf{o}\mathbf{o})$ as I am multiplying two matrices with the shape $n \times 1$.
Therefore I assumed that this derivative is wrong. Could you please give me the correct result with step-by-step solution?
I don't think that $\frac{\partial}{\partial w}(w^Tb) = b$ is fully correct. Rather $$\frac{\partial}{\partial w}(w^Tb) = \frac{\partial}{\partial w}(b^Tw) = b^T$$ Using this you get $$ \frac{\partial f}{\partial w} = (t - w^To)(-o^T) = -to^T + w^Too^T $$ and there's no problem with multiplying $oo^T$.
Still, it should be noted that $(w^To)o$ can have a well-defined matematical sense, even if $w^T(oo)$ doesn't; that's because in $(w^To)o$ you can identify $\mathbb M_{1\times 1}(\mathbb R)\equiv \mathbb R$, and you have a multiplication $\mathbb R \times\mathbb M_{n\times 1}(\mathbb R) \to M_{n\times 1}(\mathbb R)$ even if you don't have a multiplication $M_{1\times 1}(\mathbb R) \times M_{n\times 1}(\mathbb R) \to M_{n\times 1}(\mathbb R) $. Still again, you can make use of a tensor product and write $$(w^To)o = w^T(o \otimes o)$$ where it needs to be understood that the contraction of $w^T$ with $o\otimes o$ is at the first component.