I've come across this definition when looking into how to differentiate parameter vectors in statistics.
Given $ \pmb{x}^{T} \pmb{x}$ $$\frac{\partial (\pmb{x}^{T} \pmb{x}) }{\partial \pmb{x}}=2\ \pmb{x}^{T}$$
And the proofs I've seen utilize the product rule while holding $\pmb{x}^{T}$ and $\pmb{x}$ constant. (Source: http://www.cs.huji.ac.il/~csip/tirgul3_derivatives.pdf)
$$\frac{\partial (\pmb{x}^{T} \pmb{x}) }{\partial \pmb{x}}= \frac{\partial ({x}^{T} \pmb{x}) }{\partial \pmb{x}} + \frac{\partial (\pmb{x}^{T} x) }{\partial \pmb{x}}= \pmb{x}^{T} + \pmb{x}^{T} = 2\pmb{x}^{T} $$
I understand how we get $\pmb{x}^{T}$ if we differentiate $\frac{\partial ({x}^{T} \pmb{x}) }{\partial \pmb{x}}$.
What I'm not seeing is how we get $\frac{\partial (\pmb{x}^{T} x) }{\partial \pmb{x}} = \frac{\partial (\pmb{x}^{T}) }{\partial \pmb{x}}x=\pmb{x}^{T}$
- Why is it that $\pmb{x}$ becomes $\pmb{x}^{T}$ when we differentiate $\pmb{x}^{T}$ with respect to $\pmb{x}$ ?
As $x^Tx$ is a scalar, $\color{blue}x^Tx=(\color{blue}x^Tx)^T=x^T\color{blue}x.$