This website shows an implementation in R that fits a Padé approximant:
$$ R(x)= \frac{\sum_{j=0}^m a_j x^j}{1+\sum_{k=1}^n b_k x^k}=\frac{a_0+a_1x+a_2x^2+\cdots+a_mx^m}{1+b_1 x+b_2x^2+\cdots+b_nx^n} $$ to some data.
They write the function like this: $$ y = \frac{a_0 + a_1 x + a_2 x^2}{1 + b_1 x + b_2 x^2} $$
A bit of rearranging:
$$ y (1 + b_1 x + b_2 x^2) = a_0 + a_1 x + a_2 x^2 $$ $$ y = a_0 + a_1 x + a_2 x^2 - b_1 x y - b_2 x^2 y $$
Now, they retrieve the coefficients using R's lm()
function (linear model).
Now we have a form that lm can work with. We just need to specify a set of inputs that are powers of x (as in a traditional polynomial fit), and a set of inputs that are y times powers of x. This may seem like a strange thing to do, because we are making a model where we would need to know the value of y in order to predict y. But the trick here is that we will not try to use the fitted model to predict anything; we will just take the coefficients out and rearrange them in a function. The fit_pade function below takes a dataframe with x and y values, fits an lm model, and returns a function of x that uses the coefficents from the model to predict y:
fit_pade <- function(point_data){
fit <- lm(y ~ x + I(x^2) + I(y*x) + I(y*x^2), point_data)
lm_coef <- as.list(coef(fit))
names(lm_coef) <- c("a0", paste0(rep(c('a','b'), each=2), 1:2))
with(lm_coef, function(x)(a0 + a1*x + a2*x^2)/(1 - b1*x - b2*x^2))
}
Some examples are available in the link.
Is there a name for this method of fitting? Is there a paper, article, book, that describes this, either as is, or in more general terms? Preferably something citable?
Bonus question - can this be safely expanded to higher order polynomials?
In my humble opinion, this is not the way of building a Padé approximant but rather to fit a function over a given range by the ratio of two polynomials of given degrees. Remember that, as Taylor series, a Padé approximant is built around a given point.
Let us show it with $e^x$, the $[2,2]$ Padé approximant of which (built around $x=0$) being $$\frac{1+\frac{1}{2}x+\frac{1}{12}x^2}{1-\frac{1}{2}x+\frac{1}{12}x^2}$$ and now use $$y = a_0 + a_1 x + a_2 x^2 - b_1 x y - b_2 x^2 y$$ with $20$ equally spaced data points $x_i$ between $-1$ and $+1$ with $y_i=e^{x_i}$.
The linear regression would give the following results $$\begin{array}{clclclclc} \text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\ a_0 & +1.00003 & 0.00003 & \{+0.99997,+1.00010\} \\ a_1 & +0.50444 & 0.00235 & \{+0.49943,+0.50946\} \\ a_2 & +0.08385 & 0.00114 & \{+0.08142,+0.08628\} \\ b_1 & -0.49522 & 0.00231 & \{-0.50015,-0.49029\} \\ b_2 & +0.07957 & 0.00104 & \{+0.07736,+0.08178\} \\ \end{array}$$
Moreover, if you speak about least square fit, you must continue with nonlinear regression since what is "measured" is $y$ and not any of its possible transform. This linear regression gives you good estimates to start with; so, now, let us fit the true model $$y = \frac{a_0 + a_1 x + a_2 x^2}{1 + b_1 x + b_2 x^2}$$ This would give $$\begin{array}{clclclclc} \text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\ a_0 & +1.00006 & 0.00003 & \{+1.00000,+1.00012\} \\ a_1 & +0.50900 & 0.00234 & \{+0.50401,+0.51399\} \\ a_2 & +0.08609 & 0.00118 & \{+0.08356,+0.08862\} \\ b_1 & -0.49077 & 0.00228 & \{-0.49562,-0.48591\} \\ b_2 & +0.07761 & 0.00099 & \{+0.07550,+0.07971\} \\ \end{array}$$
For sure, since this is a least square fit, over the range it will be better than the strict Padé approximant. Considering the norms $$\Phi_1=\int_{-1}^{+1} \left(e^x-\frac{1+\frac{1}{2}x+\frac{1}{12}x^2}{1-\frac{1}{2}x+\frac{1}{12}x^2} \right)^2\,dx=1.26 \times 10^{-6}$$
$$\Phi_2=\int_{-1}^{+1} \left(e^x-\frac{a_0 + a_1 x + a_2 x^2}{1 + b_1 x + b_2 x^2} \right)^2\,dx=6.81 \times 10^{-9}$$