I've been trying to understand Newton Interpolation and I've found a proof that I am able to wrap my head around: http://www.math.umd.edu/~mariakc/teaching-2/interpolation.pdf , page 3. But there's one thing stated which I am not sure about, namely that
Suppose the polynomials $P[x_0, . . . , x_{n−1}](x)$ and $P[x_1, . . . , x_n](x)$ of the form of Eq. (2) $$ P_n(x) = f[x_0] + f[x_0, x_1](x − x_0) + f[x_0, x_1, x_2](x − x_0)(x − x_1) + . . . + f[x_0, x_1, . . . , x_n](x − x_0)(x − x_1). . .(x − x_{n−1}). $$ interpolate $f$ at the points $x_0, . . . , x_{n−1}$ and $x_1, . . . , x_n$ respectively. They differ by a polynomial of degree $n − 1$ that has zeros at $x_1, ..., x_{n−1}$.
I understand that there must be $n-1$ zeros / roots. I have written out the difference $$P[x_1, . . . , x_n](x)-P[x_0, . . . , x_{n−1}](x) = $$ $$f[x_1]-f[x_0]+f[x_1,x_2](x-x_1)-f[x_0,x_1](x-x_0)...+...-...+f[x_1, ..., x_{n-1}(x-x_1)...(x-x_{n-1})-f[x_0, ..., x_{n-2}(x-x_0)...(x-x_{n-2})$$
and verified for up to some n. But I would like a general proof, since it's my only missing piece to understand the Newton Interpolation proof in general.
Say the first interpolant is $p$ and the second interpolant is $q$, now $p$ and $q$ have at most degree $n-1$ by definition, so $p-q$ does as well. Additionally, $p$ and $q$ agree at $x_1,x_2,\dots,x_{n-1}$ by hypothesis, so $p-q$ has zeros there. Therefore, without needing to know anything about $f$, you know what $p-q$ is, except for a multiplicative constant, because of the fundamental theorem of algebra. That's really all this step is for.
What's less obvious is why you care about $p-q$ at all. And honestly, this way of setting things up obfuscates that a little bit, I think.
The idea behind this entire setup is, if you have an interpolant $p$ on $x_0,\dots,x_{n-1}$ and an interpolant $q$ on $x_1,\dots,x_n$, you can "glue them together" by looking at $r(x)=p(x)+c(x)(q(x)-p(x))$, provided that $c(x_0)=0$ (so that $r(x_0)=p(x_0)$) and $c(x_n)=1$ (so that $r(x_n)=q(x_n)$). It doesn't matter what $c$ does at the other nodes because $r(x)=p(x)$ at the other nodes automatically. So now you just choose $c(x)=\frac{x-x_0}{x_n-x_0}$ and you're set to go.