I'm having trouble with the following exercise, taken from Chapter 6 of Royden and Fitzpatrick's Real Analysis:
Let $f$ be defined on $[a,b]$ and $g$ a continuous function on $[\alpha, \beta]$ that is differentiable at $\gamma \in (\alpha, \beta)$ with $g(\gamma) = c \in (a,b)$. Show that if $g'(\gamma) > 0$, then $\bar D(f \circ g)(\gamma) = \bar D f(c) \cdot g'(\gamma)$ where $$\bar Df(x) = \lim_{h \to 0} \sup_{0 < |t| \leq h} \frac{f(x + t) - f(x)}{t}$$
If $g$ were linear, the problem is straight-forward: \begin{multline} \bar D(f \circ g)(\gamma) = \lim_{h \to 0}\sup_{0 < |t| \leq h}\frac{f(c + g'(\gamma)\cdot t) - f(c)}{t} = \\\lim_{h \to 0}\sup_{0 < |s| \leq g'(\gamma)\cdot h}\frac{f(c + s) - f(c)}{s}\cdot g'(\gamma) = \bar D f(c) \cdot g'(\gamma) \end{multline} I'm having trouble in the general case, in part because the problem places so little structure on $f$. The error from linearly approximating $g(\cdot)$ at $\gamma$ can be made arbitrarily small, but without a continuity condition on $f$ I'm unsure how to make the approximation error "go away". Thank you!
To just throw out an idea, have you tried the usual chain rule proof method of re-writing
$$\frac{f(g(x + t)) - f(g(x))}{t}$$
as
$$\frac{f(g(x + t)) - f(g(x))}{g(x+t) - g(x)} \cdot \frac{g(x + t) - g(x)}{t}.$$ (Below I'll refer these two fractions as "the first term" and "the second term".)
Because $g'$ exists at $\gamma$ you can pick a small interval around $\gamma$ and bound the second term to be arbitrarily close to $g'(\gamma)$.
Because $g'(\gamma) > 0$ there is an interval around $\gamma$ where the denominator of the first term is non-zero, which means you don't have to fuss about possible division by zero. (You might have to think about that point for a bit).
Because $\bar D f(c)$ exists, you can pick an $h_1$ small enough that $\sup_{0 < |t| \leq h_1} \frac{f(x + t) - f(x)}{t}$ is arbitrarily close to $\bar D f(c)$. Because $g$ is continuous at $\gamma$ you can find an $h_2$ so that $|\tau| < h_2$ implies $|g(x+\tau) - g(x)| < h_1$, and so the first term actually ranges over a subset of the terms we took the $\sup$ of when finding $\bar D f(c)$, so it will go to the same limit. (There are some ugly details hiding in there too.)
My guess is that writing it up this way would be an inglorious slog through many epsilons and deltas, and picking intervals that depend on other intervals, etc., etc.. but it could get the job done. But even if you don't use most of the ideas, I'll point out that I think that it's the continuity of $g$ at $\gamma$ that will let you translate "how $f(g(t))$ reacts to small changes in $t$" to "how $f(t)$ reacts to small changes in $t$". The $\sup$ is there in the definition of $\bar D$, in distinction to a regular derivative, to be able to handle arbitrarily badly-behaved functions -- we shouldn't expect much help from $f$ behaving nicely.