Background
We are all very familiar with the concept of a linear approximation. For a 2D smooth real-valued function $f$, the first-order Taylor expansion is $f(x,y)\approx f(x_{0},y_{0})+(x-x_{0})f_{x}(x_{0},y_{0})+(y-y_{0})f_{y}(x_{0},y_{0})$. Loosely speaking, this formula solves the problem of getting a "locally optimal" linear approximation
$L(x,y) = A\langle x-x_{0}, y-y_{0}\rangle ^{T}$, where $A$ is a $2\times 2$ matrix. Within the broader topic of "generalized linear models" there is a notion of an "additive linear model." In this case, we are looking for an approximation of the form $A(x,y) = g(x)+h(y)$ for smooth 1D functions $g$ and $h$.
Smoothness is not necessarily required for additive approximations in general, but it is a natural condition to add in this question. Obviously this is a much larger set of functions, but it preserves some useful properties of linearity.
Question
I think the most obvious method for generating $g$ and $h$ would be to approximate them using 1D Taylor approximations. Then, the whole additive approximation would be a potentially high-order Taylor approximation minus the cross terms. In some cases this would perform very well, but we could only guarantee $O(xy)$ convergence. Can this be improved upon?
Context:
I have worked on some "engineering math" problems where the coefficients in a linear approximation are treated as proxies for "how much impact" each variable in a system is having on some objective function of interest (see: sensitivity coefficients and especially "Derivative-based local methods"). In some cases, it seems like additive models (already used in some data science applications due to their explainability) could potentially do this with higher-accuracy. There's probably some issue with doing things this way or I would have heard about it already but who knows. Maybe there's a non-trivial application here.