Let $(V,\langle \cdot,\cdot\rangle)$ be a Lorentzian $n+1-$dimensional vector space. I want to prove that, given a nonzero linear functional $f\in V'$, we have that for any two future timelike vectors $X$ and $Y$ we have
$$A_f(X,Y):=f(X)f(Y)-\frac12 \langle f,f\rangle\langle X,Y\rangle>0,$$
where $\langle f,h\rangle$ on $V'$ is defined via $f_i h_j g^{ij}$ (and $g_{ij}$ is the lorenz inner product matrix). Since this problem is very easily stated in a coordinate-free manner, as above, I feel like there should be a slick proof of the result that does not involve getting one's hands dirty. Unfortunately, I did not manage to find one, and working in coordinates I only managed to prove the result for either $X$ or $Y$ in a particular form (see below). Any suggestion is thus greatly appreciated.
Working in "orthonormal" coordinates, I managed to prove the result whenever either $X$ or $Y$ are of the form $(1,0,\dots,0)^t$: indeed suppose $Y=(1,\dots,0)^t$, $X=(X^0,\dots,X^n)$. Then
$$A_f(X,Y)=f_0^2X^0+\sum_{i=1}^n f_0f_i X^i+\frac12\left(-f_0^2+\sum_{i=1}^n f_i^2\right)X^0\\ =\frac12f_0^2X^0+\sum f_0f_iX^i+\frac12f_i^2X^0.$$ Let us write $\|f\|^2:=\sum_{i=0}^nf_i^2$ and $|f|:=\sqrt{\sum_{i>0}f_i^2}$ (and similarly for $X$). Then applying Cauchy-Schwarz we get $$A_f(X,Y)=\frac12\|f\|^2X^0+f_0\sum f_iX^i\ge \frac12 \|f\|^2X^0-|f_0||f||X|.$$ Since $X$ is future timelike, $X^0>$ and $(X^0)^2>|X|^2$ and the result follows, since it is always the case that $\frac12\|f\|^2\ge |f_0||f|$.
Since space is homogeneous, i.e. looks "the same" from different perspectives, an inner product expression ought to only care about how vectors are situated relative to each other, rather than relative to an arbitrary coordinate system. (Conversely, how the vectors are situated relative to each other ought to be necessary information for the inner product expression.) This means we can choose a coordinate system which is based on the vectors themselves (the ones that appear in an identity or expression). Indeed, coordinate-based arguments can be just as natural as coordinate-free ones so long as coordinates are chosen this way.
For simplicity, coordinates are chosen so that all vectors have the simplest possible coordinates and the resulting calculation involves the smallest possible fixed number of coordinates instead of an arbitrary number of them. Equivalently, the chosen basis is constructed from the vectors that appear in the identity by using Gram-Schmidt. This is also equivalent to writing vectors as sums of parallel/perpendicular parts with respect to each other, using the bilinearity of $\langle\cdot,\cdot\rangle$, then evaluating inner products at vectors that are exclusively parallel/perpendicular. If you've done physics problems, you've chosen coordinates using directions parallel or perpendicular to various forces in a free body diagram, for example.
Apply Gram-Schmidt to $\{f,X,Y\}$ from right to left to get coordinates for which
$$ f=(f_0,f_1,f_2,0,\cdots) \\ X=(X_0,X_1,0,\cdots) \\ Y = (1,0,0,\cdots) $$
Equivalently, we could have argued $f,X,Y$ have these forms "WLOG" by invoking symmetry, i.e. how we can rescale $f,X,Y$ and also the fact that $\mathrm{SO}(1,n)$ acts transitively on the future-timelike hyperboloid sheet defined by $\langle X,X\rangle=-1$ and then $\mathrm{SO}(n)$ acts transitively on $S^{n-1}\subset\mathbb{R}^n$. The transitivity properties of this symmetry group in fact follow from the aforementioned orthogonal-basis construction process, though. In any case, note that 3 coordinates is the minimum number of generic, fixed coordinates because we have an identity involving 3 vectors.
Now this yields $A=\frac{1}{2}(f_0^2+f_1^2+f_2^2)X_0-f_0f_1X_1$ which is positive since $\frac{1}{2}(f_0^2+f_1^2)\ge |f_0f_1|$ (from Cauchy-Schwarz) and $X_0>|X_1|$ (from $X_0>0$ and $-X_0^2+X_1^2<0$) and $f,X\ne0$.