Derivative of vector times its transposed wrt. itself

6.4k Views Asked by At

I have tried to get $$\frac{d}{d\vec{x}}\left[\vec{x}^T\vec{x}\right].$$

One approach is to use a component-wise example in 3D. $\begin{bmatrix}x_1 & x_2 & x_3\end{bmatrix}\cdot\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix} = x_1^2 + x_2 ^2 + x_3^2$

this derived wrt the vector $\vec{x}=\begin{bmatrix}x_1 \\ x_2 \\ x_3\end{bmatrix}$ should give $$\begin{bmatrix}\frac{\partial }{\partial x_1}(x_1^2 + x_2 ^2 +x_3^2) \\ \frac{\partial}{\partial x_2}(x_1^2 + x_2 ^2 +x_3^2)\\ \frac{\partial}{\partial x_3}(x_1^2 + x_2 ^2 +x_3^2)\end{bmatrix}=\begin{bmatrix}2x_1\\2x_2\\2x_3\end{bmatrix}$$

On the other hand, using the product rule: $$\frac{d}{d\vec{x}}\left[\vec{x}^T\vec{x}\right] = \frac{d}{d\vec{x}}\vec{x} + \vec{x}^T \frac{d}{d\vec{x}} = \vec{x}+\vec{x}^T$$ These cannot be added together because they have different dimensionalities. So what did I do wrong? And more importantly, what is the correct derivative of $\vec{x}^T\vec{x}$?

2

There are 2 best solutions below

6
On BEST ANSWER

The easiest way is to use the implicit /external definition of the gradient (can be obtained by the chain rule)

$$d F=dx^T\,\nabla F.$$

EDIT: Explanation of to obtain the external definition of the gradient. Consider a function $F=F(x_1,...,x_n)$ Then the total derivative is given by

$$dF = \dfrac{\partial F}{\partial x_1}dx_1+...+\dfrac{\partial F}{\partial x_n}dx_n=dx_1\dfrac{\partial F}{\partial x_1}+...+dx_n\dfrac{\partial F}{\partial x_n}$$ $$=dx^T\begin{bmatrix}\dfrac{\partial F}{\partial x_1}\\\vdots\\\dfrac{\partial F}{\partial x_n} \end{bmatrix}=dx^T\,\nabla_\text{column} F=\nabla_\text{row}F\,dx $$

What we have to do is to determine the total derivative of your expression

$$d(x^Tx)=dx^T x+x^Tdx.$$

Note, that both expressions are scalars hence we can transpose the second one to obtain the first expression:

$$d(x^Tx)=dx^T x+dx^Tx=dx^T\left[2x\right]$$

Comparing this expression with the implicit definition of the gradient we obtain

$$\dfrac{dx^Tx}{dx^T}=\nabla \left[x^Tx \right]=2x.$$


An alternative approach is to calculate the partial derivatives

$$\dfrac{\partial \sum_{j=1}^n x_j^2}{\partial x_i}=\sum_{j=1}^n\dfrac{\partial x_j^2}{\partial x_i}=2x_i$$

and then assemble the gradient as $2x$.


Or using index notation (summation over double indices)

$$\dfrac{\partial x_jx_j}{\partial x_i}=\dfrac{\partial x_j}{\partial x_i}x_j+x_j\dfrac{\partial x_j}{x_i}=\delta_{ji}x_j+x_j\delta_{ji}=x_i+x_i=2x_i.$$

The symbol $\delta_{ij}=\delta_{ji}$ is the Kronecker delta / permutation function.

2
On

The first you have done its true. you can not generalize every theorem of scalar analysis to the vector analysis simply.The derivative of a scalar relative to a vector is defined as a vector which its entries are achieved by derivating relative to the vector entries.By doing this you can conclude that: $$ d(x^T*A*x)/d(x)=(A+A^T)x$$ for arbitrary matrix A. So set $$A=I$$ and conclude your desired.