Gradient of 2-norm squared

176.5k Views Asked by At

Could someone please provide a proof for why the gradient of the squared $2$-norm of $x$ is equal to $2x$?

$$\nabla\|x\|_2^2 = 2x$$

4

There are 4 best solutions below

9
On BEST ANSWER

Use the definition. If $$f(x)=\|x\|^2_2= \left(\left(\sum_{k=1}^n x_k^2 \right)^{1/2}\right)^{2}=\sum_{k=1}^n x_k^2 ,$$ then $$\frac{\partial}{\partial x_j}f(x) =\frac{\partial}{\partial x_j}\sum_{k=1}^n x_k^2=\sum_{k=1}^n \underbrace{\frac{\partial}{\partial x_j}x_k^2}_{\substack{=0, \ \text{ if } j \neq k,\\=2x_j, \ \text{ else }}}= 2x_j.$$ It follows that $$\nabla f(x) = 2x.$$

0
On

Another approach that extends to more general settings is to use the connection between the norm and the inner product, $$\|x\|^2 = (x,x).$$

We have the finite difference, \begin{align} \|x+sh\|^2 - \|x\|^2 &= (x+sh,x+sh) - (x,x) \\ &= (x,x) + 2s(x,h) + s^2(h,h) - (x,x) \\ &= 2s(x,h) + s^2(h,h). \end{align}

The gradient acting in the direction $h$ is the limit of this finite difference as the stepsize goes to zero, \begin{align} (\nabla\|x\|^2, h) &:= \lim_{s \rightarrow 0} \frac{1}{s}\left[\|x+sh\|^2 - \|x\|^2\right] \\ &= \lim_{s \rightarrow 0} \frac{1}{s}\left[2s(x,h) + s^2(h,h)\right] \\ &= (2x,h). \end{align} Since this holds for any direction $h$, the gradient must be $\nabla \|x\|^2 = 2x$.

0
On

Here an other simple proof using directly the definition of differentiability at a point.

1-But first let's remmeber that $f(\vec{x})$ is said to be differentaible at point $x$ if $\forall \vec{h}$ you have that you can writte $ f(\vec{x}+ \vec{h})= f(\vec{x}) + L(\vec{h}) + o(\vec{h})$ with $L(\vec{h})$ a linear mapping in $\vec{h}$ and $\lim_{\vec{h} \to \vec{0}} || \frac{o(\vec{h})} {||\vec{h}||} ||$

2-Here $f(\vec{x})= ||\vec{x}||^2$ $$||\vec{x}+\vec{h}||^2 = ||\vec{x}||^2 + || \vec{h}||^2 + <\vec{x}|\vec{h}> + <\vec{h}|\vec{x}> = f(\vec{x}) + <\vec{x}|\vec{h}> + <\vec{h}|\vec{x}> + || \vec{h}||^2 $$
We note $o(\vec{h}) = || \vec{h}||^2 \Rightarrow \lim_{\vec{h} \to \vec{0}} || \frac{||\vec{h}||^2} {||\vec{h}||} || = \lim_{\vec{h} \to \vec{0}} || \vec{h}|| = 0$
Obviously $L( \vec{h}) = <\vec{x}|\vec{h}> + <\vec{h}|\vec{x}>$ as it is a linear mapping in $\vec{h}$ because we work with real number we get that $<\vec{x}|\vec{h}> = <\vec{h}|\vec{x}> \Rightarrow <\vec{x}|\vec{h}> + <\vec{h}|\vec{x}> = 2<\vec{x}|\vec{h}> $

3- Now again by definition the unique vector $ \vec{\nabla f( \vec{x})}$ satisfying $ 2<\vec{x}|\vec{h}> = <\vec{\nabla f( \vec{x})}| \vec{h}> $ is the gradient.
Thus it cames trivially that $\vec{\nabla f( \vec{x})} = 2\vec{x} $

3
On

I'm not sure if this is rigorous enough to count as a proof, but an elegant way to obtain derivatives of vector expressions is to use matrix differential calculus.

Let $y = \lVert x \rVert_2^2 = x^{T} x$ with $x \in \mathbb{R}^{n}$. Using the product rule, the differential of $y$ is $$ dy = dx^{T} x + x^{T} dx = 2 x^{T} dx $$

We can then set $$ dy = \frac{dy}{dx} dx = (\nabla_{x} y)^{T} dx = 2x^{T} dx $$ where $dy/dx \in \mathbb{R}^{1 \times n}$ is called the derivative (a linear operator) and $\nabla_{x} y \in \mathbb{R}^{n}$ is called the gradient (a vector).

Now we can see $\nabla_{x} y = 2 x$.


If $x$ is complex, the complex derivative does not exist because $z \mapsto |z|^{2}$ is not a holomorphic function.

We can, however, instead consider the real derivatives with respect to the two components of $x$. Let $x = u + i v$. With this definition, $y$ is a real function of $u, v \in \mathbb{R}^{n}$ defined by $$ y = x^* x = (u + i v)^* (u + i v) = u^T u - v^T v $$ Taking the differential $$ dy = 2 u^T du - 2 v^T dv = \frac{\partial y}{\partial u} du + \frac{\partial y}{\partial v} dv $$ and therefore $$ \nabla_{u} y = 2 u \enspace , \qquad \nabla_{v} y = -2 v $$


For an introduction to matrix differential calculus, see the lecture of Geoff Gordon on YouTube or the paper on matrix derivatives of Mike Giles.