My textbook presents this theorem without any sort of introduction. It does cover using the Newton and Lagrange forms of the interpolation polynomial, so I've got that. Anyway, here's the theorem:
Theorem on Polynomial Interpolation Error
Let $f$ be differentiable $n + 1$ times on the interval $[a, b]$, and let $p$ be the polynomial of degree at most $n$ that interpolates the function $f$ at $n + 1$ distinct points $x_0, x_1, \ldots, x_n$ in the interval $[a, b]$. To each $x$ in $[a, b]$ there corresponds a point $\xi_x$ in $(a, b)$ such that $$f(x) - p(x) = \frac{1}{(n + 1)!} f^{n + 1}(\xi_x)\prod_{i = 0}^n(x - x_i).$$
I'm hoping that someone could briefly explain where this equation comes from, or at least where I might find more information.
The formula originates from the natural desire to have an estimate for the error of polynomial interpolation. Since it involves $f^{(n+1)}$ evaluated at an unknown point lying in a certain interval, the practical use of this formula requires one to obtain an upper bound for $|f^{(n+1)}|$ on said interval. If $M$ is such a bound, then $$ |f(x) - p(x)| \le \frac{M}{(n + 1)!} \prod_{i = 0}^n |x - x_i| $$ Looking at this estimate, we realize the importance of the placement of the nodes $x_i$. Trying to place them optimally, with $\prod_{i = 0}^n |x - x_i|$ as small as possible on the interval of interpolation, leads us to the Chebyshev nodes.
The proof of the error formula is a clever application of Rolle's theorem: you can read it in ProofWiki.