So I was asked to use Lagrange interpolation to find the an approximation for the population at a given year.
def InterpolateLagrange(x, y, xval):
#usage: InterpolateLagrange(x, y)
#x: array of x-values
#y: array of y-values
#xval: x-value, for which the y-value should be obtained
#output:
#yval: y-value
#establish the inital y-value (yval)
yval = 0.0
#assign the length of x as n
n= len(x)
#for loop which will go through each integer of n
for i in range(n):
#Establish the initial term value
term = y[i]
for j in range(n):
if j != i:
#Implement the Lagrange method any time i does not equal j
term = term * (xval - x[j]) / (x[i] - x[j])
#The the be the collectve sum of all the term values
yval += term
#Returns the yval which correlates to xval
return yval
I used the method above for my data, however the yval value I received was incorrect.
year = np.array([1841., 1851., 1861., 1871., 1881., 1891., 1901., 1911., 1926.,
1936., 1946., 1951., 1956., 1961., 1966., 1971., 1979., 1981.,
1986., 1991., 1996., 2002., 2006., 2011.])
pop = np.array([6528799., 5111557., 4402111., 4053187., 3870020., 3468694.,
3221823., 3139688., 2971992., 2968420., 2955107., 2960593.,
2898264., 2818341., 2884002., 2978248., 3368217., 3443405.,
3540643., 3525719., 3626087., 3917203., 4239848., 4588252.])
x = year
y = pop
xval = 1868
InterpolateLagrange(x, y, xval)
#Output -5805741994.539
But that's okay, this was expected and I found a better approximation using cubic spline interpolation.
But my question is why didn't Lagrange interpolation work with the given data?
Is it because the interpolation of the points (in this case) are growing too quickly with n? Any information on the topic would be greatly appreciated.
Let's say you have the data points $(x_0,y_0),...,(x_n,y_n)$, where $x_0<...<x_n$.
These data points are samples (without error) of a function $f$.
We'll call $P\left(f\middle| x_0,\ldots,x_k\right)\left(x\right)$ the interpolation polynomial we get when interpolating through the points $(x_0,y_0),...,(x_k,y_k)$.
Given that $f$ is at least $n+1$ times continuously differentiable, the following estimation holds:
$$|f\left(x\right)-P\left(f\middle| x_0,\ldots,x_n\right)\left(x\right) |\le \left|\max_{\xi\in [l,r]} \frac{f^{\left(n+1\right)}\left(\xi\right)}{\left(n+1\right)!}\prod_{i=0}^{n}{x-x_i}\right|$$
Here, the LHS is your estimation error.
Let's make the additional assumption that all derivatives of $f$ are bounded by a constant $\pm c$ .
Then, adding a new point to the polynomial $P(f|x_0,...,x_k)(x)$ adds a factor $$\ge \left| \frac{x-x_i}{n+1}\right|$$
You have $n=24$, and the data points are at least $5$ years apart each.
So, given our assumptions so far, every data point more than $24$ years away from the point you want to estimate will at most worsen the result.