Why doesn't Lagrange Interpolation work well in this case?

189 Views Asked by At

So I was asked to use Lagrange interpolation to find the an approximation for the population at a given year.

def InterpolateLagrange(x, y, xval):

   #usage: InterpolateLagrange(x, y)
       #x: array of x-values
       #y: array of y-values
       #xval: x-value, for which the y-value should be obtained
     
   #output: 
       #yval: y-value

   #establish the inital y-value (yval)
   yval = 0.0
   
   #assign the length of x as n
   n= len(x)
   
   #for loop which will go through each integer of n
   for i in range(n):
       #Establish the initial term value 
       term = y[i]
       for j in range(n):
           if j != i:
               #Implement the Lagrange method any time  i does not equal j
               term = term * (xval - x[j]) / (x[i] - x[j])
       
       #The the be the collectve sum of all the term values
       yval += term
   
   #Returns the yval which correlates to xval
   return yval

I used the method above for my data, however the yval value I received was incorrect.

year = np.array([1841., 1851., 1861., 1871., 1881., 1891., 1901., 1911., 1926.,
      1936., 1946., 1951., 1956., 1961., 1966., 1971., 1979., 1981.,
      1986., 1991., 1996., 2002., 2006., 2011.])
pop = np.array([6528799., 5111557., 4402111., 4053187., 3870020., 3468694.,
      3221823., 3139688., 2971992., 2968420., 2955107., 2960593.,
      2898264., 2818341., 2884002., 2978248., 3368217., 3443405.,
      3540643., 3525719., 3626087., 3917203., 4239848., 4588252.])

x = year
y = pop
xval = 1868

InterpolateLagrange(x, y, xval)


#Output -5805741994.539

But that's okay, this was expected and I found a better approximation using cubic spline interpolation.

But my question is why didn't Lagrange interpolation work with the given data?

Is it because the interpolation of the points (in this case) are growing too quickly with n? Any information on the topic would be greatly appreciated.

2

There are 2 best solutions below

0
On BEST ANSWER

Let's say you have the data points $(x_0,y_0),...,(x_n,y_n)$, where $x_0<...<x_n$.

These data points are samples (without error) of a function $f$.

We'll call $P\left(f\middle| x_0,\ldots,x_k\right)\left(x\right)$ the interpolation polynomial we get when interpolating through the points $(x_0,y_0),...,(x_k,y_k)$.

Given that $f$ is at least $n+1$ times continuously differentiable, the following estimation holds:

$$|f\left(x\right)-P\left(f\middle| x_0,\ldots,x_n\right)\left(x\right) |\le \left|\max_{\xi\in [l,r]} \frac{f^{\left(n+1\right)}\left(\xi\right)}{\left(n+1\right)!}\prod_{i=0}^{n}{x-x_i}\right|$$

Here, the LHS is your estimation error.

Let's make the additional assumption that all derivatives of $f$ are bounded by a constant $\pm c$ .
Then, adding a new point to the polynomial $P(f|x_0,...,x_k)(x)$ adds a factor $$\ge \left| \frac{x-x_i}{n+1}\right|$$

You have $n=24$, and the data points are at least $5$ years apart each.
So, given our assumptions so far, every data point more than $24$ years away from the point you want to estimate will at most worsen the result.

0
On

Your program using Lagrange interpolation is correct, but the result of this interpolation has problems. I ran the equivalent program with Mathematica using rational arithmetic but using only some of the data points. That is, using only $15$ to $21$ of the data points, the results for year $1868$ are $$ 4130172, 4214080, 4342304, 4007068, -164106, -23688203, -126349191. $$ By $n=19$ the result is negative and gets worse from then on. If you look at the trend, the results look reasonable up to $n=13$. Probably the best you can hope for is with $n=5$ with result $4134547$. In this case, as in others, adding more data points away from the target year $1868$ does not give better results.

Using other interpolation methods can give much better results. For example, Mathematica, using only the first $5$ data points returns the result $4131597$ and adding more data points does not change the result because it is effectively ignoring the extra data points.