I'm trying to show that the limit of Newton's interpolation formula as $x_i\to x_0$ ($i=1,\ldots,n$) gives the Taylor's formula (knowing that $f\in \mathcal{C}^{n+1}[x_0-\delta,x_0+\delta]$) $$T(x)=f(x_0)+f'(x_0)(x-x_0)+\ldots+\frac{f^{(n)}(x_0)}{n!}(x-x_0)^n+\frac{f^{(n+1)}(\xi)}{(n+1)!}(x-x_0)^{n+1}$$ The Newton's interpolation formula is $$P_n(x)=f(x_0)+f(x_0,x_1)(x-x_0)+\ldots+f(x_0,\ldots,x_n)(x-x_0)\ldots (x-x_{n-1})$$ So basically I must show that $$\lim_{(x_1,\ldots,x_n)\to(x_0,\ldots,x_0)}P_n(x)=T(x)$$
$f(x_0,\ldots,x_k)$ is defined as $$f(x_0,\ldots,x_k)=\frac{f(x_1,\ldots,x_k)-f(x_0,\ldots,x_{k-1})}{x_k-x_0}$$ From that definition it's easy to show that $$\lim_{(x_1,\ldots,x_n)\to(x_0,\ldots,x_0)}f(x_0,x_1)=\lim_{(x_1,\ldots,x_n)\to(x_0,\ldots,x_0)}\frac{f(x_1)-f(x_0)}{x_1-x_0}\overset{def}{=}f'(x_0)$$ But I get stuck when I have to find the limit of $f(x_0,x_1,x_2)$. $$\lim_{(x_1,\ldots,x_n)\to(x_0,\ldots,x_0)}f(x_0,x_1,x_2)=\lim_{(x_1,\ldots,x_n)\to(x_0,\ldots,x_0)}\frac{f(x_1,x_2)-f(x_0,x_1)}{x_2-x_0}=???=\frac{1}{2}f''(x_0)$$ So my question is what should I do at $(???)$ ? I'm guessing that if I get this step figured out, finding the limit of $f(x_0,\ldots,x_{n-1})$ is analogous?
Write $$ f\left( {x_1 } \right) = f\left( {x_0 } \right) + f'\left( {x_0 } \right)\left( {x_1 - x_0 } \right) + {1 \over 2}f''\left( {x_0 } \right)\left( {x_1 - x_0 } \right)^{\,2} + O\left( {\left( {x_1 - x_0 } \right)^{\,3} } \right) $$ and to keep the following passages more clear let's use the concise form $$ f\left( {x_1 } \right) = f\left( {x_0 } \right) + f'\left( {x_0 } \right)\Delta x_0 + {1 \over 2}f''\left( {x_0 } \right)\Delta x_0 ^{\,2} + O\left( {\Delta x_0 ^{\,3} } \right) $$
Then write $f(x_2)$ in this double version $$ \eqalign{ & f\left( {x_2 } \right) = \cr & = f\left( {x_1 } \right) + f'\left( {x_1 } \right)\Delta x_1 + {1 \over 2}f''\left( {x_1 } \right)\Delta x_1 ^{\,2} + O\left( {\Delta x_1 ^{\,3} } \right) = \cr & = \left( {f\left( {x_0 } \right) + f'\left( {x_0 } \right)\Delta x_0 + {1 \over 2}f''\left( {x_0 } \right)\Delta x_0 ^{\,2} + O\left( {\Delta x_0 ^{\,3} } \right)} \right) + \cr & + \left( {f'\left( {x_0 } \right) + f''\left( {x_0 } \right)\Delta x_0 + O\left( {\Delta x_0 ^{\,2} } \right)} \right)\Delta x_1 + \cr & + {1 \over 2}\left( {f''\left( {x_0 } \right) + O\left( {\Delta x_0 } \right)} \right)\Delta x_1 ^{\,2} + O\left( {\Delta x_1 ^{\,3} } \right) = \cr & = f\left( {x_0 } \right) + f'\left( {x_0 } \right)\left( {\Delta x_0 + \Delta x_1 } \right) + {1 \over 2}f''\left( {x_0 } \right)\Delta x_0 \left( {\Delta x_0 + \Delta x_1 } \right) + \cr & + O\left( {\Delta x_0 ^{\,3} } \right) + O\left( {\Delta x_0 ^{\,2} \Delta x_1 } \right) + O\left( {\Delta x_0 \Delta x_1 ^{\,2} } \right) + O\left( {\Delta x_1 ^{\,3} } \right) \cr} $$
Therefore $$ \eqalign{ & f\left[ {x_0 ,x_1 } \right] = {{\left( {f\left( {x_1 } \right) - f\left( {x_0 } \right)} \right)} \over {\Delta x_0 }} = f'\left( {x_0 } \right) + {1 \over 2}f''\left( {x_0 } \right)\Delta x_0 + O\left( {\Delta x_0 ^{\,2} } \right) \cr & f\left[ {x_1 ,x_2 } \right] = {{\left( {f\left( {x_2 } \right) - f\left( {x_1 } \right)} \right)} \over {\Delta x_1 }} = f'\left( {x_1 } \right) + {1 \over 2}f''\left( {x_1 } \right)\Delta x_1 + O\left( {\Delta x_1 ^{\,2} } \right) \cr & f\left[ {x_0 ,x_1 ,x_2 } \right] = {{f\left[ {x_1 ,x_2 } \right] - f\left[ {x_0 ,x_1 } \right]} \over {\left( {x_2 - x_0 } \right)}} = \cr & = {1 \over {\left( {\Delta x_0 + \Delta x_1 } \right)}}\left( {f'\left( {x_1 } \right) - f'\left( {x_0 } \right) + {1 \over 2}f''\left( {x_1 } \right)\Delta x_1 - {1 \over 2}f''\left( {x_0 } \right)\Delta x_0 } \right) + O\left( {\Delta x} \right) = \cr & = {1 \over {\left( {\Delta x_0 + \Delta x_1 } \right)}}\left( {f''\left( {x_0 } \right)\Delta x_0 + {1 \over 2}f''\left( {x_0 } \right)\Delta x_1 + O\left( {\Delta x_0 \Delta x_1 } \right) - {1 \over 2}f''\left( {x_0 } \right)\Delta x_0 } \right) + O\left( {\Delta x} \right) = \cr & = {1 \over 2}f''\left( {x_0 } \right) + O\left( {\Delta x} \right) \cr} $$
The "general principle" of the method is clear:
since you want the relation with the MacLaurin series at $x=x_0$ then we shall tend towards expressing $f^{(n)}(x_k)$ as a series around $x_0$.
I wrote "tend to" because we must do that in cascade, as it is apparent above.
In fact, $f\left[ {x_m ,x_{m+1}, \cdots, x_n } \right]$ implies the division by $(x_n-x_m)$, so we shall develop first as series around the lowest point $x_m$, to simplify the divisor, and then carry on developing around the lowest point that appears in the denominator.
But that's not a practical way to go, either theoretical and computional-wise.
The theoretical demonstration can be achieved by the Mean Value Theorem for divide differences, which states $$ f\left[ {x_0 , \cdots ,x_n } \right] = {1 \over {n!}}f^{\left( n \right)} \left( \xi \right)\quad \left| {\;\min \left( {x_0 , \cdots ,x_n } \right) < \xi < \max \left( {x_0 , \cdots ,x_n } \right)} \right. $$ and which, to our purposes, we can write as $$ f\left[ {x_0 , \cdots ,x_n } \right] = {1 \over {n!}}f^{\left( n \right)} \left( \xi \right) = {1 \over {n!}}f^{\left( n \right)} \left( {x_0 } \right) + O\left( w \right) $$ where $w$ is the length of the interval containing $x_0, \cdots x_n$.