My textbook gives the following proof of the single-variable version of Taylor's theorem:
As promised, we begin with the Fundamental Theorem of Calculus, written in the form
$$f(x_0 + h) = f(x_0) + \int_{x_0}^{x_0 + h} f'(\tau) \ d \tau$$
Next, we write $d \tau = - d(x_0 + h - \tau)$ and integrate parts to give
$$f(x_0 + h) = f(x_0) = f'(x_0) h + \int_{x_0}^{x_0 + h} f''(\tau)(x_0 + h - \tau) \ d \tau,$$
which is the first-order Taylor formula. Integrating by parts again, we get
$$\int_{x_0}^{x_0 + h} f''(\tau)(x_0 + h - \tau) \ d\tau = - \dfrac{1}{2} \int_{x_0}^{x_0 + h} f''(\tau) d(x_0 + h - \tau)^2$$
$$= \dfrac{1}{2} f''(x_0) h^2 + \dfrac{1}{2} \int_{x_0}^{x_0 + h} f'''(\tau)(x_0 + h - \tau)^2 \ d \tau,$$
which, when substituted into the preceding formula, gives the second-order Taylor formula:
$$f(x_0 + h) = f(x_0) + f'(x_0) h + \dfrac{1}{2}f''(x_0) h^2 + \dfrac{1}{2} \int_{x_0}^{x_0 + h} f'''(\tau)(x_0 + h - \tau)^2 \ \ d\tau$$
Taylor's theorem for general $k$ proceeds by repeated integration by parts. The statement (2) that $\dfrac{R_k(x_0, h)}{h^k} \to 0$ as $h \to 0$, is seen as follows. For $\tau$ in the interval $[x_0, x_0 + h]$, we have $[x_0 + h - \tau] \le |h|$, and $f^{k + 1} (\tau)$, being continuous, is bounded; say, $\left| f^{k + 1}(\tau) \right| \le M$. Then
$$| R_k(x_0, h) | = \left| \int_{x_0}^{x_0 + h} \dfrac{(x_0 + h - \tau)^k}{k!} f^{k + 1}(\tau) \ d \tau \right| \le \dfrac{|h|^{k + 1}}{k!} M$$
and, in particular, $\left| \dfrac{R_k(x_0, h)}{h^k} \right| \le \dfrac{|h| M}{k!} \to 0$ as $h \to 0$.
I have two points of confusion:
How is $| R_k(x_0, h) | = \left| \displaystyle\int_{x_0}^{x_0 + h} \dfrac{(x_0 + h - \tau)^k}{k!} f^{k + 1}(\tau) \ d \tau \right| \le \dfrac{|h|^{k + 1}}{k!} M$ true? This is not clear to me. For instance, it is stated that $[x_0 + h - \tau] \le |h|$ and $\left| f^{k + 1}(\tau) \right| \le M$; fine. But I still don't see how this makes the above inequality true.
By extension, how is $\left| \dfrac{R_k(x_0, h)}{h^k} \right| \le \dfrac{|h| M}{k!} \to 0$ as $h \to 0$ true? This is also not clear to me.
I would greatly appreciate it if people could please take the time to explain this.
if $a \lt b$ and we know that for a measurable function $g$ defined on $(a,b)$ we have $x \in (a,b) \to |g(x)| \le M$, then it follows that $$ \bigg|\int_a^b g(x) dx \bigg| \le (b - a)M $$
in your example $b - a = h$ and $|x_0 + h - \tau|^k \le |h|^k$. Together with $\left| f^{k + 1}(\tau) \right| \le M$ these values give the required bound on the integral