Introduction
Suppose I have $N$ data points $f_i \equiv f(x_i,y_i)$, for $i=0...N-1$. The points $(x_i,y_i)$ are scattered in the $xy$ plane. The function $f(x,y)$ is unknown, but assumed to be well-behaved and continuously differentiable.
Suppose we wish to approximate a directional derivative of $f(x,y)$ (that is, $\left[\vec{\nabla}f(x,y)\right] \cdot \hat{a}$, where $\hat{a}$ is a unit vector in the $xy$ plane, as described here) at some point $(\bar{x},\bar{y})$ within the convex hull of $(x_i,y_i)$. How might this be done?
Bonus points if there's a way to do it using a specified number of points $M$ rather than the three nearest points (e.g. in a Delaunay triangulation) or all $N$ points.
Essentially, if we represent $\epsilon = \left[\vec{\nabla}f(x,y)\right] \cdot \hat{a} - \sum_{i=0}^{i=N-1} c_i f_i$, I'm looking for a way to estimate the $c_i$ coefficients such that $\epsilon$ is minimized.
I imagine this has lots of uses for things like finite element/volume PDE solvers.
Examples
Such a method should greatly reduce for really simple cases.
1D Sample
Suppose there were three sample points of the form $x_i = i \Delta x$ and $y_i=0$, and $(\bar{x},\bar{y}) = (x_1,0)$.
If $\hat{a}=\hat{x}$, then we might expect something like the following result (2nd order):
$$\frac{\partial f}{\partial x} \approx \frac{f_2 - f_0}{2 \Delta x}$$
On the other hand, if $\hat{a}=\hat{y}$, the method would fail because the function is sampled at only one value of $y$, meaning that it would be impossible to calculate $\frac{\partial f}{\partial y}$.
2D Sample
Suppose we have nine points of the form $x_i = \mod[i,3] \Delta x$ and $y_i = \lfloor i/3 \rfloor \Delta y$. If we wish to estimate $\frac{\partial f}{\partial x}$ at the central point $(\bar{x},\bar{y}) = (x_4,y_4)$, there might be two ways:
First way (using subset of points):
$$\frac{\partial f}{\partial x} \approx \frac{f_5 - f_3}{x_5 - x_3}$$
Second way (using all points):
$$\begin{align}\frac{\partial f}{\partial x} \;\approx&\; \lambda \frac{\partial f}{\partial x}\bigg|_{y=\Delta y} + (1-\lambda) \frac{1}{2}\left( \frac{\partial f}{\partial x}\bigg|_{y=0} + \frac{\partial f}{\partial x}\bigg|_{y=2\Delta y} \right) \\ \approx&\; \lambda \frac{f_5 - f_3}{x_5 - x_3} + (1-\lambda) \frac{1}{2}\left( \frac{f_2 - f_0}{x_2 - x_0} + \frac{f_8 - f_6}{x_8 - x_6} \right) \end{align}$$
Here, there'd have to be a scaling factor $\lambda$ to represent how much the derivatives at different values of $y$ would contribute to the estimation of the derivative at $y=\Delta y$. Presumably, $\lambda$ would decrease as $\Delta y$ increased.