Gradient of the norm in a pre-Hilbert space

296 Views Asked by At

Let $(H,\langle \cdot, \cdot \rangle)$ be a real pre-Hilbert space (a real vector space equipped with a scalar product). Show that the function $$ f:\textbf{x}\in H \rightarrow f(\textbf{x}) = \|\textbf{x}\| \in [0,+\infty) $$ is differentiable on $H \backslash\{0\}$ and that its differential is then given by $$ Df(x,h) = \Big\langle \frac{x}{\|x\|},h \Big\rangle $$ My answer attempt \begin{align} Df(x,h) &= D\|(\textbf{x},\textbf{h})\|\\ &= D\|\sum x_i h_i\|\\ &= D\sqrt{(\sum x_i h_i)^2} \end{align} where

  • $f_1(x) = \sqrt{x} \rightarrow f_1'(x) = \frac{1}{2}(x)^{-\frac{1}{2}}$ (1)
  • $f_2(x) = x^2 \rightarrow f_2'(x) =2x$ (2)
  • $f_3(x) = \sum x_i h_i \rightarrow \nabla f_3(\textbf{x}) = \textbf{h}$ (3)

Then $$ (f_1(f_2(f_3)))' $$ we can then say \begin{align*} f_{23} &= f_2(f_3) \rightarrow f_{23}' = f_2'(f_3)\cdot \nabla f_3\\ f_{23} &= \langle \textbf{x}, \textbf{h}\rangle^2 \rightarrow 2 \langle \textbf{x}, \textbf{h}\rangle \textbf{h} \end{align*} hence: $$ (f_1(f_2(f_3)))' = f_1(f_{23})' = f_1'(f_{23})\cdot f_{23}' $$ hence: \begin{align*} D\sqrt{(\sum x_i h_i)^2} &= \frac{1}{2}(\langle \textbf{x}, \textbf{h} \rangle^2)^{-\frac{1}{2}} \cdot 2 \langle \textbf{x}, \textbf{h} \rangle \textbf{h}\\ &= \frac{\langle \textbf{x}, \textbf{h} \rangle}{\sqrt{\langle \textbf{x}, \textbf{h} \rangle^2}} \textbf{h}\\ &= \frac{\langle \textbf{x}, \textbf{h} \rangle}{\|\langle \textbf{x}, \textbf{h} \rangle\|} \textbf{h} \quad \textbf{(4)}\\ &= \langle \frac{\textbf{x}}{\|\textbf{x}\|} \rangle \textbf{h} \quad \textbf{(5)} \end{align*}

Here are my questions:

  • I treated the function (1) and (2) as mapping from $\mathbb{R} \to \mathbb{R}$ unlike (3) which maps from $\mathbb{R}^n \to \mathbb{R}$ Is it correct?
  • What is the missing step that allows to go from (4) to (5)?
  • Please also fill free to mention any unprecision in the notations
1

There are 1 best solutions below

4
On BEST ANSWER

There are several issues with what you have written. I think the biggest issue is you haven't understood your own notation when it comes to $Df(x,h)$, because you wrote

\begin{align} Df(x,h) &= D\|(\textbf{x},\textbf{h})\|\\ &= D\|\sum x_i h_i\|\\ &= D\sqrt{(\sum x_i h_i)^2} \end{align}

  • The first equal sign itself makes no sense. First of all, $f(x,h) \neq \lVert (x,h) \rVert$. Even if it were true, that's not the meaning of $Df(x,h)$.
  • In the second line, you seem to be using the standard inner product on $\Bbb{R}^n$. This is again wrong, because firstly, you're not told that $H = \Bbb{R}^n$; $H$ could be an infinite-dimensional vector space. Even if we assume $H$ is finite dimensional, all you know about $\langle \cdot, \cdot \rangle$ is that it is an inner product (a bilinear, symmetric positive definite function); you aren't told it is the standard inner product.

I think the notation $Df(x,h)$ is pretty misleading. I prefer to use $Df_x(h)$ or $df_x(h)$. Here's how to "read" this notation: $f$ is a mapping from $H$ into $\Bbb{R}$. Now, if you fix a point $x \in H \setminus \{0\}$, then the differential of $f$ at the point $x$ is denoted by the symbol $df_x$. Note that by definition, $df_x$ itself is a bounded linear transformation from $H$ into $\Bbb{R}$.... i.e $df_x \in \mathcal{L}(H, \Bbb{R}) =: H^*$. Since $df_x: H \to \Bbb{R}$ is a linear transformation, it can "eat a vector in $H$", so if $h \in H$ then $df_x(h)$ means the linear transformation $df_x$ evaluated on the vector $h$. Now finally, $df_x(h) \in \Bbb{R}$. There's a lot of things going on here, and you need to know what each object means, and which space it lives in.


What you've proven is something completely different to what was being asked of you, hence your answer is wrong (see my first bullet point; your first equal sign was wrong, hence everything else is wrong). The correct solution however, is obtained by writing $f$ as a composition of the square root function and the inner product, and then apply the chain rule (you also tried this, but you did it incorrectly). To see exactly how, define the following maps temporarily:

  • Define $\omega: H \to \Bbb{R}$ by $\omega(x) = \langle x,x \rangle$
  • $s: [0, \infty) \to \Bbb{R}$ by $s(x) = \sqrt{x}$. Then, we can write \begin{align} f(x) := \lVert x \rVert := \sqrt{\langle x,x \rangle} := (s \circ \omega)(x) \end{align} We are asked to show that $f$ is differentiable on $H \setminus \{0\}$. So, pick any arbitrary $x\in H \setminus \{0\}$. $\omega$ is everywhere differentiable on $H$, and since $\omega(x) >0$, the square root function $s$ is differentiable at $\omega(x)$. Hence, by the chain rule, $f = s \circ \omega$ is differentiable at $x$, and we have that \begin{align} df_x = ds_{\omega(x)} \circ d\omega_x \end{align} Equivalently, for any $h \in H$, we have \begin{align} df_x(h) = ds_{\omega(x)} \big( d\omega_x(h)\big) \end{align} Note that $d\omega_x(h)$ is just a real number, and $ds_{\omega(x)}: \Bbb{R} \to \Bbb{R}$ is a linear transformation; hence \begin{align} df_x(h) &= ds_{\omega(x)} \big( d\omega_x(h) \cdot 1 \big) \\ &= ds_{\omega(x)}(1) \cdot d\omega_x(h) \\ &= s'(\omega(x)) \cdot d\omega_x(h) \end{align} (The $\cdot$ indicates multiplication of real numbers). From single variable calculus, you should know that \begin{align} s'(\omega(x)) = \dfrac{1}{2 \sqrt{\omega(x)}} := \dfrac{1}{2 \lVert x \rVert} \end{align} Now, either by applying the "generalised product rule" (which follows from the chain rule), you should find that since $\omega(x) = \langle x,x \rangle$, we have that for every $h \in H$, \begin{align} d \omega_x(h) &= \langle x,h \rangle + \langle h, x \rangle \tag{"generalised product rule"} \\ &= 2 \langle x, h\rangle \tag{$\langle \cdot, \cdot \rangle$ is symmetric} \end{align} Hence, putting this all together, we have \begin{align} df_x(h) &= s'(\omega(x)) \cdot d\omega_x(h) \\ &= \dfrac{1}{2 \lVert x \rVert} \cdot 2 \langle x, h\rangle \\ &= \left\langle \dfrac{x}{\lVert x \rVert} ,h \right\rangle \end{align}