Generalization of the chain rule using upper derivatives

222 Views Asked by At

I'm having trouble with the following exercise, taken from Chapter 6 of Royden and Fitzpatrick's Real Analysis:

Let $f$ be defined on $[a,b]$ and $g$ a continuous function on $[\alpha, \beta]$ that is differentiable at $\gamma \in (\alpha, \beta)$ with $g(\gamma) = c \in (a,b)$. Show that if $g'(\gamma) > 0$, then $\bar D(f \circ g)(\gamma) = \bar D f(c) \cdot g'(\gamma)$ where $$\bar Df(x) = \lim_{h \to 0} \sup_{0 < |t| \leq h} \frac{f(x + t) - f(x)}{t}$$

If $g$ were linear, the problem is straight-forward: \begin{multline} \bar D(f \circ g)(\gamma) = \lim_{h \to 0}\sup_{0 < |t| \leq h}\frac{f(c + g'(\gamma)\cdot t) - f(c)}{t} = \\\lim_{h \to 0}\sup_{0 < |s| \leq g'(\gamma)\cdot h}\frac{f(c + s) - f(c)}{s}\cdot g'(\gamma) = \bar D f(c) \cdot g'(\gamma) \end{multline} I'm having trouble in the general case, in part because the problem places so little structure on $f$. The error from linearly approximating $g(\cdot)$ at $\gamma$ can be made arbitrarily small, but without a continuity condition on $f$ I'm unsure how to make the approximation error "go away". Thank you!

2

There are 2 best solutions below

3
On

To just throw out an idea, have you tried the usual chain rule proof method of re-writing

$$\frac{f(g(x + t)) - f(g(x))}{t}$$

as

$$\frac{f(g(x + t)) - f(g(x))}{g(x+t) - g(x)} \cdot \frac{g(x + t) - g(x)}{t}.$$ (Below I'll refer these two fractions as "the first term" and "the second term".)

Because $g'$ exists at $\gamma$ you can pick a small interval around $\gamma$ and bound the second term to be arbitrarily close to $g'(\gamma)$.

Because $g'(\gamma) > 0$ there is an interval around $\gamma$ where the denominator of the first term is non-zero, which means you don't have to fuss about possible division by zero. (You might have to think about that point for a bit).

Because $\bar D f(c)$ exists, you can pick an $h_1$ small enough that $\sup_{0 < |t| \leq h_1} \frac{f(x + t) - f(x)}{t}$ is arbitrarily close to $\bar D f(c)$. Because $g$ is continuous at $\gamma$ you can find an $h_2$ so that $|\tau| < h_2$ implies $|g(x+\tau) - g(x)| < h_1$, and so the first term actually ranges over a subset of the terms we took the $\sup$ of when finding $\bar D f(c)$, so it will go to the same limit. (There are some ugly details hiding in there too.)

My guess is that writing it up this way would be an inglorious slog through many epsilons and deltas, and picking intervals that depend on other intervals, etc., etc.. but it could get the job done. But even if you don't use most of the ideas, I'll point out that I think that it's the continuity of $g$ at $\gamma$ that will let you translate "how $f(g(t))$ reacts to small changes in $t$" to "how $f(t)$ reacts to small changes in $t$". The $\sup$ is there in the definition of $\bar D$, in distinction to a regular derivative, to be able to handle arbitrarily badly-behaved functions -- we shouldn't expect much help from $f$ behaving nicely.

2
On

Because $g'(\gamma)>0,$ there exists an open interval $I$ containing $\gamma$ such that $(g(x)-g(\gamma))/(x-\gamma)>0 $ for $x\in I\setminus \gamma.$ Let $x_n$ approach $\gamma$ within $I\setminus \gamma.$ We have

$$\frac{f(g(x_n))- f(g(\gamma))}{x_n-\gamma}= \frac{f(g(x_n))- f(g(\gamma))}{g(x_n)-g(\gamma)}\frac{g(x_n)-g(\gamma)}{x_n-\gamma}.$$

The continuity of $g$ implies $g(x_n)\to g(\gamma).$ Let $\epsilon > 0.$ For large $n,$ the first fraction on the right is less than $\bar D f(g(\gamma))+\epsilon.$ For such $n$ we have

$$ \frac{f(g(x_n)- f(g(\gamma))}{x_n-\gamma} \le (\bar D f(g(\gamma))+\epsilon)\cdot \frac{g(x_n)-g(\gamma)}{x_n-\gamma}.$$

Apply $\limsup$ to both sides to get

$$ \limsup_{n\to \infty} \frac{f(g(x_n)- f(g(\gamma))}{x_n-\gamma} \le (\bar D f(g(\gamma))+\epsilon)g'(\gamma).$$

Since $\epsilon$ is arbitrary, the $\limsup $ on the left is $\le \bar D f(g(\gamma))g'(\gamma).$ Now $x_n$ was an arbitrary sequence approaching $\gamma,$ so we have shown

$$\bar D(f\circ g)(\gamma) \le \bar D f(g(\gamma))g'(\gamma).$$

The argument above would be reversible if we knew $g$ was injective on $I.$ But it need not be. However, shrinking $I$ if necessary, we can say this: $g(I)$ is an interval whose interior contains $g(\gamma)$ and furthermore, if $y_n \to g(\gamma)$ within $g(I),$ then there is a sequence $x_n$ in $I$ converging to $\gamma$ such that $g(x_n)=y_n.$ I'll omit the proof of this for now; perhaps you would like to have a go at it. And maybe the first thing to check is to see if the reverse inequality follows from this result.