The Nyquist / Shannon sampling theorem is like super famous in signal processing. It says that when doing regularly spaced point-wise sampling we need to sample a sinusoidal signal at least two times each period to avoid frequency aliasing.
But what happens if we don't just sample the function value in instants in time, but say a local Taylor approximation?
$$f_k(x) = \sum_{l=0}^Nc_{kl}(x-k\Delta_x)^l$$
For $N=0$ this becomes the usual sampling $f_k(x) = c_{k0}$, we only measure the function values but none of the derivatives.
Below is an illustration of Nyquist phenomenon on a typical chirp signal. We see the catastrophy that occurs when the local frequency increases above the sampling rate prescribed by Nyquist. But what would happen if we could measure also the slope at the green points or even the second derivatives et.c.? Could we push the bound upwards?

I don't claim this is an answer. I am more thinking out loud here and this is too much for a comment.
The simplest proof of the sampling theorem is to show that any signal with frequency less than some cut-off frequency, $\vert f \vert < f_c $, can be reconstructed by applying an ideal low-pass filter (defined by the same cut-off frequency) to the sampled version of the sinusoid. We can represent the sampling at a frequency of $f_s=1/T_s$ in the time domain by multiplying the input signal $x(t)$ by
$$ \textrm{comb}_{T_s}(t) = \sum_{k}{\delta(t-kT_s) }. $$
In the frequency domain this becomes
$$ x(t) \cdot \textrm{comb}_{T_s}(t) \leftrightarrow X(f) \ast \textrm{comb}_{f_s}(f). $$
Since the replicas will be separated by $f_s$ Hertz, if $X(f)$ has zero magnitude for $\vert f\vert > f_s/2$ and we let $f_c = f_s/2$, then
$$ \textrm{rect}\left( f \over 2f_c \right) \cdot \left( X(f) \ast \textrm{comb}_{f_s}(f) \right) = X(f). $$
Thus, if $x(t)$ has no spectral content at frequencies less than $-f_s/2$ and greater than $+f_s/2$, then the sampling frequency $f_s$ is sufficient to reconstruct it.
If we instead sampled the first derivative of $x(t)$, then we would have
$$ \left( { {d} \over {dt} } x(t) \right) \cdot \textrm{comb}_{T_s}(t) \leftrightarrow \left( f \cdot X(f) \right) \ast \textrm{comb}_{f_s}(t). $$
Suppose that $\vert f_1 \vert < f_s/2$ and $\vert f_2 \vert < f_s$ and $\vert f_1 - f_2 \vert = f_s$. If $x(t) = a \cdot e^{j2\pi f_1 t} + b \cdot e^{j2\pi f_2 t}$, $a,b\in\mathbb{C}$, and we sample both the value and first derivative at a rate of $f_s$, then the spectrums we see after applying the low-pass filters are
$$ X_{f_s}(f) = a\cdot\delta(f-f_1) + b\cdot\delta(f-f_1) \\ fX_{f_s}(f) = a\cdot f_1 \cdot \delta(f-f_1) + b \cdot f_2 \cdot\delta(f-f_1) , $$
which, if I am not mistaken, is enough information to reconstruct $x(t)$ given the restrictions on $f_1$ and $f_2$. So having the sample value as well as the first derivative lets us double the frequency below which we can accurately reconstruct the time domain signal.
Far from a proof, I know. But I thought the question was interesting and wanted to think about it.