Short version: if we know the $k$th derivative of $f$ at $a$ for each $k$ in some finite set $F$, there is a natural "Taylor polynomial analogue" suggested by this data which is only an actual Taylor polynomial if $F$ is downwards-closed. In case $F$ is not downwards-closed, how horribly wrong can things go?
For $f:\mathbb{R}\rightarrow\mathbb{R}$ a smooth function, $a\in\mathbb{R}$, and $F$ a finite not-necessarily-closed-downwards subset of $\mathbb{N}$, consider the "gappy Taylor polynomial" defined by $$G^F_{f,a}(x)=\sum_{k\in F}{(x-a)^k\cdot f^{(k)}(a)\over k!}$$ (to keep things precise and always-defined, I'm using the conventions that $0\in\mathbb{N}$ and $0^0=0$ here).
If we restrict attention to downwards-closed sets in the above definition we get "optimal" approximations to the original function $f$. Precisely, we have:
If $F$ is downwards-closed with $\max(F)=d$ and $p$ is any degree-$d$ polynomial, then there is an $\epsilon>0$ such that for all $x\in (a-\epsilon,a+\epsilon)$ we have $$\vert G_{f,a}^F(x)-f(x) \vert\le \vert p(x)-f(x)\vert.$$
If $F_1,F_2$ are downwards-closed and $F_1\subseteq F_2$, then there is an $\epsilon>0$ such that for all $x\in(a-\epsilon,a+\epsilon)$ we have $$\vert G_{f,a}^{F_2}(x)-f(x) \vert\le \vert G_{f,a}^{F_1}(x)-f(x) \vert.$$
(Of course (1) implies (2) trivially.)
My question is whether the appropriate analogues of (1) and (2) above continue to hold when we allow non-downwards-closed sets. Specifically, consider (3) and (4) below:
If $F$ is a finite set of natural numbers and $p$ is a polynomial whose $x^k$-coefficient is nonzero only if $k\in F$, then there is an $\epsilon>0$ such that for all $x\in (a-\epsilon,a+\epsilon)$ we have $$\vert G_{f,a}^F(x)-f(x) \vert\le \vert p(x)-f(x)\vert.$$
If $F_1\subseteq F_2$, then there is an $\epsilon>0$ such that for all $x\in(a-\epsilon,a+\epsilon)$ we have $$\vert G_{f,a}^{F_2}(x)-f(x) \vert\le \vert G_{f,a}^{F_1}(x)-f(x) \vert.$$
Question: Which of (3) and (4) are true?
Clearly (3) implies (4), analogously to how (1) implies (2) above, but all three possibilities taking this into account seem plausible to me. In particular, all remainder analyses I've seen have a fundamentally "inductive" structure, which breaks down once we look at sets that aren't downwards-closed; on the other hand, I can't seem to cook up a counterexample to (3), let alone to (4).
Let $f(x)=1-x^2$, and let $F=\{2\}$. Then $p(x)=0$ is a better approximation of $f$ using only coefficients with powers in $F$ than $G_{f,0}^F(x)=-x^2$, on any $\epsilon$-neighborhood of $0$.
Letting $F_1=\emptyset$ and $F_2=\{2\}$, this example gives a counterexample to (4) as well as (3).
The fundamental issue seems to me to be that even powers of $x$ are somehow non-independent over $\mathbb{R}$. I suspect that something similar to your conjecture is probably true over $\mathbb{C}$, with a Fourier-shaped proof.