I have been reading a proof on the convergence of Newton's method that has been fairly easy to follow except for a single step that has totally mystified me because it suddenly depends on a lot more functional analysis than I know (this is my first time exposed to the Hanh Banach Theorem).
Below is an excerpt of the part which is confusing me.
Given $\lVert F^{\prime}(y) - F^{\prime}(x)\rVert \leq L\lVert x-y\lVert,$
where $L$ is a fixed positive constant and $F :X\rightarrow X$ is a continuously differentiable function from $X$ to $X$ where $X$ is a Banach space. I would like to show that
$$\lVert F(y)-[F(x)+F^{\prime}(x)(y-x)]\rVert\leq \frac{L}{2}\lVert y-x\rVert^{2}.$$
The proof proceeds as follows, define $$y(\theta):= x + \theta(y-x) \quad R(\theta) := F(y(\theta)) - [F(x)+F^{\prime}(x)(y(\theta)-x)]$$
Then by the Hanh-Banach Theorem, there is a $\xi \in X^{*}$ such that $\lVert \xi \rVert =1$ and $\xi(R(1)) = \lVert R(1)\rVert$. Define a function $h(\theta):= \xi(R(\theta))$ so that
$$\frac{dh}{d\theta} = \xi\left(F^{\prime}(y(\theta))-F^{\prime}(x)\right). \quad (1)$$
Then using the assumption:
\begin{equation} \frac{dh}{d\theta}(\theta) \leq L \lVert y(\theta)-x\rVert. \quad(2) \end{equation}
My question then is, why is equation (1) and equation (2) true? First, it seems like we should instead have
$$\frac{dh}{d\theta} = \xi^{\prime}(F^{\prime}(y(\theta)-F^{\prime}(x))y^{\prime}(\theta).$$
Despite reading the Hanh Banach Theorem repeatedly, I do not understand why we actually get $$\frac{dh}{d\theta} = \xi \left(\frac{d}{d\theta}R(\theta)\right).$$ or what happened to $y^{\prime}(\theta)$ by the chain rule. I feel like $y^{\prime}(\theta)$ was just omitted, but I am sure linearity plays a role here so that we do not compute $\xi^{\prime}$, but I have no idea why or how.
Furthermore, even if this were all true, why is
$$\frac{dh}{d\theta}(\theta) = \xi\left(F^{\prime}(y(\theta))-F^{\prime}(x)\right) \leq L \lVert y(\theta)-x\rVert?$$
I don't understand how $\lVert \xi \rVert =1$ helps me make this conclusion by using the hypothesis $$\lVert F^{\prime}(y) - F^{\prime}(x)\rVert \leq L\lVert x-y\lVert$$
To me, the $\xi$ and the lack of a norm on the left hand side gets in the way.
Note that by linearity $$\eqalign{ \frac{h(\theta+\delta)-h(\theta)}{\delta}&=\frac{1}{\delta}\left(\xi(R(\theta+\delta))- \xi(R(\theta))\right)\cr &=\frac{\xi(R(\theta+\delta)-R(\theta))}{\delta}\cr &=\xi\left(\frac{R(\theta+\delta)-R(\theta)}{\delta}\right)\cr} $$ Taking the limit as $\delta\to0$, we get $$h'(\theta)=\xi(R'(\theta))\tag{$1'$}$$ Now, $y'(\theta)=y-x$ and $$R'(\theta)=F'(y(\theta))y'(\theta)-F'(x)y'(\theta) =(F'(y(\theta))-F'(x))(y-x)$$ So, $(1')$ becomes $$h'(\theta)=\xi((F'(y(\theta))-F'(x))(y-x))\tag{$2'$}$$ Which is the correct alternative to the OP's relation $(1)$.
Now, $\Vert\xi\Vert=1$, implies that $|\xi(v)|\le \Vert v\Vert$ for every $v$, so $$|h'(\theta)|\le \Vert (F'(y(\theta))-F'(x))(y-x))\Vert \le\Vert F'(y(\theta))-F'(x)\Vert\cdot\Vert y-x\Vert $$ Finally $$|h'(\theta)|\le L \Vert y(\theta))-x\Vert\cdot\Vert y-x\Vert=L\theta \Vert y-x\Vert^2 $$ Integrating, we get $$\Vert R(1)\Vert=|h(1)-h(0)|\le\int_0^1|h'(\theta)|d\theta\le \frac{L}{2}\Vert y-x\Vert^2$$ which is the desired inequality.