Finding gradients of functions from $\mathbb{R}^n \to \mathbb{R}$

35 Views Asked by At

I'm trying my hand at learning differentiation of vector functions, and I'm trying to apply my knowledge of "regular" one dimensional cases to some examples, however I'm not sure if my methods are correct. Am I making errors here? (all "x" are vectors in $\mathbb{R}^n$ and $A$ is n*n matrix:

$$ r(x) = ||x||_2 = \sqrt{\langle x, x \rangle} \\ \; \\ \;\;\;\;\;\ \triangledown r(x) = \frac{1}{2} (x^Tx)^\frac{-1}{2} 2x = \frac{x}{||x||_2} \\ \; \\ \;\;\;\;\;\;\;\;\;\;\;D^2 r(x) = -\frac{1}{2}(x^Tx)^{-\frac{3}{2}} x \; +\; (x^Tx)^{-\frac{1}{2}} $$

$$\;\;\;$$

$$g(x) \;\;\ = \;\; ||x||_2^4 -2 \langle Ax, x \rangle \;\; = \;\; (\langle x,x \rangle )^2 -2 \langle Ax, x \rangle \\ \;\\ \triangledown g(x) \;\; = \;\; 4(x^Tx)x - 4Ax \;\; = \;\; 4||x||_2^2 x - 4Ax \;\;\;\;\;\;\;\;\;\;\; $$

Thank you so much for your time and input!!!

1

There are 1 best solutions below

1
On BEST ANSWER

First of all, technically speaking, a function $\mathbb{R}^n\to\mathbb{R}$ is not a vector function, but a scalar function of several variables.

Now, let's get to the point. You can take such analogies only that far… Meaning that there are derivative rules for vector functions and derivative rules for functions of several variables, and many of them can be understood well by drawing an analogy with derivative rules for "usual" functions (scalar functions of a single variable). But taking vector or multivariate derivatives requires the respective vector or multivariate rules (so to speak). Using inappropriate rules most usually wouldn't make sense.

Here's one glaring example of such an issue in your calculations: what do you mean by $D^2r(\mathbf{x})$? Whatever you meant, the "answer" you obtained doesn't make any sense: the first term is a vector and the second term is a scalar, so you can't possibly add them together.

The computation of the gradient $\nabla r(\mathbf{x})$ is best understood if you do it component-wise. Otherwise, it's hard to explain how you got the "$2\mathbf{x}$" part in there: if it's the second part of the Chain Rule, which requires taking the derivative of the inside function, then can you explain the derivative with respect to what it is? Remember that the gradient operator combines partial derivatives with respect to all individual variables (components of $\mathbf{x}$).

But component-wise it's pretty easy. Let $\mathbf{x}=(x_1,x_2,\ldots,x_n)\in\mathbb{R}^n$, and start with $$r(\mathbf{x})=\|\mathbf{x}\|_2=(\mathbf{x}\cdot\mathbf{x})^{\frac{1}{2}}=(x_1^2+x_2^2+\cdots+x_n^2)^{\frac{1}{2}}=\left(\sum_{k=1}^n x_k^2\right)^{\frac{1}{2}}.$$ Then for any variable $x_j$, $1\le j\le n$, we have $$\frac{\partial r}{\partial x_j}=\frac{1}{2}\left(\sum_{k=1}^n x_k^2\right)^{-\frac{1}{2}}\cdot2x_j=\left(\sum_{k=1}^n x_k^2\right)^{-\frac{1}{2}}\cdot x_j=\frac{x_j}{\|\mathbf{x}\|_2},$$ and so $$\nabla r=\left(\frac{\partial r}{\partial x_1},\frac{\partial r}{\partial x_2},\ldots,\frac{\partial r}{\partial x_n}\right)=\left(\frac{x_1}{\|\mathbf{x}\|_2},\frac{x_2}{\|\mathbf{x}\|_2},\ldots,\frac{x_n}{\|\mathbf{x}\|_2}\right)=\frac{(x_1,x_2,\ldots,x_n)}{\|\mathbf{x}\|_2}=\frac{\mathbf{x}}{\|\mathbf{x}\|_2}.$$ Interestingly enough, you did get the correct answer for this one, but without proper explanation, it's as much luck as math.