Finding gradient of a $x^Tx + c$ for some constant $c\in\mathbb{R}$

1.4k Views Asked by At

I am very familiar with calculating gradients for single and multi-variable functions, as a vector containing partial derivatives.

However, when I want to calculate the gradient of a vector product, given some undefined vector, I do not understand the intuition. Say I want to calculate the gradient of $f(x) = x^Tx + c$ for some constant $c\in\mathbb{R}$

In my textbook, I have a simple table which evaluates this to $2x$. The constant of cause evaluates to zero, for any derivative by itself, but how does $x^Tx$ evaluate to $2x$?

What I am trying to understand is how intuitively to engage finding partial derivatives of a vector, without knowing its values.

I might think of $x^T$ as a vector containing multiple inputs: $[x_1,...,x_n] \in\mathbb{R}^n$ and similar for $x$ (but as a column vector), but how would I go about finding the gradient of such an expression?

3

There are 3 best solutions below

4
On BEST ANSWER

Hint Expanding in components, we find that $$x^\top x = x_1^2 + \cdots + x_n^2 .$$

1
On

The best way to think about the derivative is $$ \tag{1} f(x + \Delta x) \approx f(x) + f'(x) \Delta x. $$ In this example, if $\Delta x$ is small, then \begin{align*} f(x + \Delta x) &= x^T x + x^T \Delta x + \Delta x^T x + \underbrace{\Delta x^T \Delta x}_{\text{negligible}} \\ &\approx x^T x + x^T \Delta x + \Delta x^T x \\ &= f(x) + 2 x^T \Delta x. \end{align*} Comparing this with (1), we discover that $$ f'(x) = 2 x^T. $$ The gradient of $f$ is $$ \nabla f(x) = f'(x)^T = 2x. $$ So there's no need to compute partial derivatives or to think in terms of the components of $x$.

0
On

Just just the Leibniz-rule (using $a^Tb=b^Ta$): $$d_p(x^Tx)=p^Tx+x^Tp=x^Tp+x^Tp=2x^Tp,$$ hence the gradient of $x^Tx$ is $2x$.