Confused by simple proof of the Estimates of Differentiable Functions

Question

Confused by simple proof of the Estimates of Differentiable Functions

42 Views Asked by Bumbble Comm At 02 Apr 2026 - 11:20

I'm working through Fact 1 on page 11, Chapter 1 of Implicit Functions and Solution Mappings: A View from Variational Analysis by Dontchev and Rockafellar (famed author of Convex Analysis). Here is the statement and proof:

I follow the proof up to the point where the authors invoke the triangle inequality and the continuity of the Jacobian $\nabla f(x)$ at $\bar{x}$. It seems to me that property $(a)$ is strictly a result of the function $f$ being differentiable everywhere in a convex neighborhood of $\bar{x}$. I actually can't quite see why property $(b)$ is true.

So I don't see how the conclusions follow from the triangle inequality and continuity. What am I missing? Also, is there something a little Lipschitzian about these inequalities?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Subtracting from both sides, you say you agree with the author(s) that: $$\langle h,f(x’)-f(x)-\nabla f(x)(x’-x)\rangle=\langle h,[\nabla f(x+t(x’-x))-\nabla f(x)](x’-x)\rangle$$For some $0<t<1$ and any (unit) vector $h\in\Bbb R^m$. Note that $\|x+t(x’-x)-x\|<\|x’-x\|$: here is where we use continuity of the $\nabla$ at $\tilde x$. By continuity, for arbitrary $\varepsilon>0$, there is $\delta>0$, $\|y-\tilde x\|<\delta$ implying that the operator norm $\|\nabla f(y)-\nabla f(\tilde x)\|<\frac{\varepsilon}{2\sqrt{m}}$. Then, using the triangle inequality (valid for operator norms too!): $$\|\begin{align}\nabla f(y)-\nabla f(y’)\|&\le\|\nabla f(y)-\nabla f(\tilde x)\|+\|\nabla f(y’)-\nabla f(\tilde x)\|\\&<\frac{\varepsilon}{2\sqrt{m}}+\frac{\varepsilon}{2\sqrt{m}}=\frac{\varepsilon}{\sqrt{m}}\end{align}$$ For $y,y’\in B_{\delta}(\tilde x)$. For $x,x’$ in this ball we get, setting $y’:=x+t(x’-x)\in B_{\delta}(\tilde x)$ and $y:=x$: $$\|[\nabla f(x+t(x’-x))-\nabla f(x)](x’-x)\|<\frac{\varepsilon}{\sqrt{m}}\|x’-x\|$$

Take the standard coordinate basis $\{e_i\}_{i=1}^m$ for $\Bbb R^m$. By substituting $h=e_i$ into the inner products - with the effect of just picking out individual coordinates - we get that the $i$th coordinate of $f(x’)-f(x)-\nabla f(x)(x’-x)$ is equal to the $i$th coordinate of $[\nabla f(x+t(x’-x))-\nabla f(x)](x’-x)$, which is bounded in absolute value by $\frac{\varepsilon}{\sqrt{m}}\|x’-x\|$ since, for any vector $y$, $\max(|y_i|)\le\|y\|$. Then, we can estimate: $$\|f(x’)-f(x)-\nabla f(x)(x’-x)\|<\frac{\varepsilon}{\sqrt{m}}\sqrt{m}\cdot\|x’-x\|=\varepsilon\cdot\|x’-x\|$$For $x,x’$ in the mentioned ball. The estimate comes from the fact that: $$\begin{align}\|y\|&=\sqrt{y_1^1+\cdots+y_m^2}\\&\le\sqrt{\underset{m\text{ times}}{\underbrace{\max(y_i^2)+\max(y_i^2)+\cdots+\max(y_i^2)}}}\\&=\sqrt{m\max(y_i^2)}\\&=\max(|y_i|)\sqrt{m}\end{align}$$For any vector $y$. If each coordinate is bounded by $C$ then the norm is bounded by $C\sqrt{m}$.

The continuity of the derivative is very important. You said it felt like $(a)$ was just a restatement of the definition for the derivative at $\tilde x$. Not quite. If $(a)$ had $x’$ replaced with some $y$ and $x$ replaced with $\tilde x$, then we would have the definition of derivative at $\tilde x$. By allowing arbitrary derivatives evaluated at $x’,x$ close to, but not necessarily equal to, $\tilde x$, you are involving a continuity argument. Indeed, $(a)$ would be the definition for the derivative at $x$ instead, were it not for the fact that $x’,x$ are being chosen close to $\tilde x$ - the quantification of the variables is different. The problem? Well, if you fixed an $x$ close to $\tilde x$, then $(a)$ would hold for all $x’$ close to $x$. Eventually you may be picking $\varepsilon$ so small that the necessary $\delta$, about $x$, is so small that $x\notin B_{\delta}(\tilde x)$. Then the variable quantification in $(a)$ would become inappropriate, we would be forbidden from choosing $x$ as it lies outside the ball. I hope that clarifies the difference.

To pass $(a)$ into $(b)$ requires continuity also. Essentially, $\|\nabla f(x)-\nabla f(\tilde x)\|$ is very small if $x$ is very close to $\tilde x$, and the triangle inequality allows you bound by adding the two small errors to get another small error, so effectively replace $\nabla f(x)$ for $\nabla f(\tilde x)$.

More precisely, since we know $(a)$ is true, for $\varepsilon>0$ I can have some $\delta’>0$ for which the inequality in $(a)$ holds with $x’,x\in B_{\delta’}(\tilde x)$ and $\varepsilon/2$. By continuity of the derivative, there is some $\delta’’>0$, $\|x-\tilde{x}\|<\delta’’$ implying $\|\nabla f(x)-\nabla f(\tilde x)\|<\varepsilon/2$. Then let $\delta=\min(\delta’,\delta’’)>0$. We have: $$\begin{align}\|f(x’)-f(x)-\nabla f(x)(x’-x)\|&\le\frac{1}{2}\|x’-x\|\\\|\nabla f(x)(x’-x)-\nabla f(\tilde x)(x’-x)\|&\le\frac{1}{2}\varepsilon\|x’-x\|\end{align}$$For all $x,x’\in B_{\delta}(\tilde x)$ since $B_{\delta}(\tilde x)$ is a subset of both $B_{\delta’,\delta’’}(\tilde x)$. By the triangle inequality: $$\begin{align}\|f(x’)-f(x)-\nabla f(\tilde x)(x’-x)\|&=\|[f(x’)-f(x)-\nabla f(x)(x’-x)]+[\nabla f(x)(x’-x)-\nabla f(\tilde x)(x’-x)]\| \\&\le\|f(x’)-f(x)-\nabla f(x)(x’-x)\|+\|\nabla f(x)(x’-x)-\nabla f(\tilde x)(x’-x)\|\\&\le\frac{1}{2}\varepsilon\|x’-x\|+\frac{1}{2}\varepsilon\|x’-x\|\\&=\varepsilon\|x’-x\|\end{align}$$As required.

As for the Lipschitz remark, yes, continuously differentiable functions are locally Lipschitz for this very reason (though not always globally).

Confused by simple proof of the Estimates of Differentiable Functions

There are 1 best solutions below

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in DERIVATIVES

Related Questions in LIPSCHITZ-FUNCTIONS

Trending Questions

Popular # Hahtags

Popular Questions