Is this notation valild when differentiating w.r.t. a vector?

62 Views Asked by At

If I want to find the gradient of

$$f(x) = x^Tx+c$$

where $x$ is a vector of size $n$ and $c$ is a constant, can I write it using the following notation

$$\frac{\partial}{\partial x} \left( x^Tx+c \right) = \frac{\partial}{\partial x}x^Tx+\frac{\partial}{\partial x}c = \frac{\partial}{\partial x}x^Tx = \frac{\partial}{\partial x}\sum_{i=1}^nx_ix_i = \sum_{i=1}^n 2x_i = 2x$$

Or more specifically does

$$\frac{\partial}{\partial x}\sum_{i=1}^nx_ix_i = \sum_{i=1}^n 2x_i = 2x$$ make any sense or am I mixing element and vector notation. I.e differenting wrt. a vector but writing it as a sum. It seemes unintuitive that i can go from a sum to a scalar times a vector in the last step?

2

There are 2 best solutions below

1
On

Why is it un-intuitive ?

The derivative w.r.t a vector is defined as $$\frac{\partial f(x)}{\partial x} = \begin{bmatrix}\frac{\partial f(x)}{\partial x_1} \\ \vdots \\ \frac{\partial f(x)}{\partial x_n} \end{bmatrix} \tag{1}$$

So in your case, $$\frac{\partial f(x)}{\partial x_k} = \frac{\partial }{\partial x_k} (x^Tx + c ) = \frac{\partial }{\partial x_k} x^Tx = \frac{\partial }{\partial x_k} \sum_{i=1}^n x_i^2 = \frac{\partial }{\partial x_k} (x_1^2 + \ldots + x_k^2 + \ldots x_n^2) = 2x_k \tag{2}$$ Replacing $(2)$ in $(1)$ we get $$\frac{\partial f(x)}{\partial x} = \begin{bmatrix}2x_1 \\ \vdots \\ 2x_n \end{bmatrix} = 2x$$

0
On

There will be sticklers who define $\partial_xf$ as the transpose of @AhmadBazzi's definition so the chain rule $df=dx^i(\partial_xf)_i$ contracts according to the Einstein convention. On this view, the derivative would be $2x^T$. The same ideas apply when we differentiate a scalar with respect to a matrix.