Suppose there is a function $f: \mathbb R\to \mathbb R$ and that we only know $f(0),f(h),f'(h),f(2h)$ for some $h>0$. and we can't know the value of $f$ with $100$% accuracy at any other point.
What is the optimal way of approximating $f''(0)$ with the given data?
I'd say that $f''(0)=\frac{f'(h)-f'(0)}{h}+O(h)$ and $f'(0)=\frac{f(h)-f(0)}{h}+O(h)$, therefor we get
$$f''(0)=\frac{f'(h)-\frac{f(h)-f(0)}{h}}{h}+O(h)$$
But that can't be the optimal way since we know $f(2h)$ and i didn't use it at all.
Could someone shed some light?
Let's say that $$y_k=\{f(0),f(h),f'(h),f(2h)\}$$ are given and $$x_k=\{f(0),f'(0),f''(0),f'''(0)\}$$ are unknown. Taylor's theorem gives a way of writing each $y_k$ as a linear combination of $x_j$'s, dropping all the terms starting with $f^{(4)}(0)$. For example: $$f(2h)=f(0)+f'(0)(2h)+\frac12f''(0)(2h)^2+\frac16f'''(0)(2h)^3. $$
Conversely, let $y=Ax$ be the linear equations relating $y$'s and $x$'s. These are four linear equations in four unknowns. To find the solution $x_2=f''(0)$ in terms of $y$'s just solve the system of linear equations for $x$. To find how accurate the approximation is, compute the Taylor expansion of $y$'s to a higher order and substitute into the expression for $x_2$. The answer will always have the form $$ f''(0) = \alpha_1 y_1+\cdots+\alpha_4 y_4 + \Theta(h^m), $$ where $m$ is the order of the approximation.
This approximation is "best" in the sense that it is the highest-order approximation possible (in this case $x_2 = f''(0) + O(h^2)$) with the given data. This is also the way all the standard finite-difference formulas are derived.