Calculating derivatives with the product rule

68 Views Asked by At

I am trying to calculate the derivative of $x^{T}x$ where x is a column vector.

A correct way of doing this is shown in this formula

However, I am getting different results with the product rule:

$\frac{d(x^{T}x)}{dx}=x^{T}*\frac{dx}{dx}+\frac{d(x^{T})}{dx}*x = x^T + x \ \ (\neq 2x^{T})$

(I used this formula in Leibniz notation from Wikipedia)

The problem is probably that it is a dot product and not a regular product.

So my question is: how do I apply the product rule for dot products correctly?

2

There are 2 best solutions below

0
On

It would be more helpful if they called it the gradient since we are talking about multivariable function: $$ f:\mathbb R^m\longrightarrow\mathbb R $$ given by $f(x)=x^T x$ where $x$ is an $m\times 1$ column vector. The gradient actually consists of $m\times 1$ derivatives, namely describing the rate of change in each coordinate of the input (the partial derivatives) with respect to each coordinate in the output (which is just one here, since the output is 1-dimensional). So the gradient is given by: $$ \frac{df}{dx}=\nabla f=\left(\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},...,\frac{\partial f}{\partial x_m}\right) $$ So you have to determine derivatives for each of the $m$ input-coordinates and stack them together in an $1\times m$ row vector.

0
On

Here you are differentiating $x^T \cdot x$ with respect to the vector $x$, so be careful to define properly all the objects you are playing with, especially $d/dx$.

To see how to get the right expression, I would write down the definition of dot product first and work in terms of differentials. Let us introduce a function $f \colon \mathbb{R}^n \to \mathbb{R}$ so that $$f(x^1,\dots,x^n) := x^T\cdot x = \langle x,x \rangle,$$ where $\langle \cdot, \cdot \rangle$ denotes the standard scalar product in $\mathbb{R}^n$. In other words you are fixing a vector $x$ and setting $f = \langle x, \cdot \rangle$. In case you have heard about musical isomorphisms, $f$ is nothing but $\flat (x)$, where $\flat \colon \mathbb{R}^n \to (\mathbb{R}^n)^*: x \mapsto \flat(x)\colon y \mapsto \langle x,y \rangle$. You have that $$\frac{\partial f}{\partial x^k}(x) = 2x^k, \text{ for } k = 1,\dots,n.$$ Then $$df_x = \sum_{k=1}^n2x^kdx^k = 2x^T.$$ You see, in this expression $x^T$ is not only the transpose of a vector, but it is a covector, namely an element of $(\mathbb{R}^n)^*$. Notice that the gradient $\nabla f(x)$ is exactly $df_x$ (they are the same covector), so I am inclined to think your $df/dx$ should be defined as $\nabla f$.