Error on Simpson Sum and Trapezoidal sum

115 Views Asked by At

I was coding an algorithm to approximate the integral $\int_{0}^{1} f(x) dx\int_{0}^{1} cos(x)+1/10cos(10x)+1/50cos(50x) dx$ as i encountered some interessting graphs. When approximating for $N=2^k$ for $k\in \mathbb{N}$ intervals with an size of $h=\frac{b-a}{N}=\frac{1}{2^k}$ i realized that the error between the excact integration and the approximations (here: $|T(h)-I(f)| $ and $|S(h)-I(f)|$ with T being the Trapezodial sum and S being the Simpson sum) went down as assumed until a certain point number of intervals with the error in Simpson decreasing fast than the Trapezoid one (also excpected).

But at a certain interval size $h=\frac{1}{2^{15}}$ (therefor $2^{15}$ intervals) i realized that the error rate is certainly increasing. This goes so far, that even the boundary set for the maximum error $E_{Simpson}=-h^4(b-a)\frac{f^4(\xi)}{2880}$ and $E_{Trapezoid}=-h^2(b-a)\frac{f^2(\xi)}{12}$ for $a=0$ and $b=1$ got broken. As i am assuming my algorithm is completly correct i was interessted in how this boundary could be broken. An aspect could be computation errors due to very low numbers but i was thinking that there might be a mathematical reason behind it.

I am also sharing a plot with you (axis are written in german but Y-Axis meaning the absolute error and X-Axis meaning k with interval amount $2^k$, both axis scaled with Logarithm to base 2. Graph

I am interessted wether some of you have an explanation to this phenomen or if it is really just an data error due to the limits of computers.

Greetings, Florian

1

There are 1 best solutions below

1
On BEST ANSWER

It looks like rounding error.

I ran trapz on octave and got:

exactI       =    8.359258237575212e-01
qtrapz15     =    8.359258237547984e-01
errortrapz15 = qtrapz15 - exactI
errortrapz15 =   -2.722821967893196e-12
qtrapz30     =    8.359258237608431e-01
errortrapz30 = qtrapz30 - exactI
errortrapz30 =    3.321898311980931e-12

The absolute error for $t30$ is greater than $t15$. Similar to your results.

I then ran a C++ program using $128$ binary bit floating point numbers and got:

exact    = 0.835925823757521237004089087553020177787458754748900059337972
trap15   = 0.835925823754799058014553389923348889088758837882225475274439
trap30   = 0.835925823757521237001553860011910824446002551425735037263132
errort15 = -2.72217898953569762967128869869991686667458406353306678167762e-12
errort30 = -2.53522754110935334145620332316502207483983770730852571768988e-21

The trapezoidal errors are now less than the estimated errors.

The C++ code uses gmp GNU Multiple Precision Arithmetic Library and mpreal a C++ interface to it.

traptest128bits.cpp:

#include <iostream>
#include <vector>
#include <stdlib.h>
#include "mpreal.h"
// ------------------------------------------------------------------
mpfr::mpreal f(mpfr::mpreal x)
{
    return cos(x) + cos(10.0*x)/10.0 + cos(50.0*x)/50.0;
}
// ------------------------------------------------------------------
// Integral
mpfr::mpreal If(mpfr::mpreal x)
{
    return sin(x) + sin(10.0*x)/100.0 + sin(50.0*x)/2500.0;
}
// ------------------------------------------------------------------
int main(int argc, char* argv[])
{
    using mpfr::mpreal;    
    using std::cout;
    using std::endl;
    using std::cerr;
   
    // Required precision of computations in bits
    const int numbits = 128;

    // Setup default precision for all subsequent computations
    // MPFR accepts precision in bits 
    mpreal::set_default_prec(numbits);

    cout.precision(100);
    
    mpreal exact = If(1.0) - If(0.0);
    cout << "exact = " << exact << endl;
    
    mpreal dx15 = pow(2,-15);
    cout << "dx15 = " << dx15 << endl;

    mpreal sum15(0.0);
    unsigned long kmax15 = pow(2,15);
    mpreal x(0.0);
    for(unsigned long k = 0; k <= kmax15; k++)
    {
        if ((k % 1000) == 0) 
        {
            printf("\r%lu",k); 
            fflush(stdout);
        }
            
        sum15 += f(x);
        x += dx15;
    }
    cout << endl;
    
    sum15 = sum15 - (f(0.0) + f(1.0))/2.0;
    
    mpreal trap15 = sum15*dx15;
    mpreal errort15 = trap15 - exact;
    
    cout << "trap15 = " << trap15 << endl;
    cout << "errort15 = " << errort15 << endl;
    
    
    // -------------------------------------------------------------------------
    
    mpreal dx30 = pow(2,-30);
    cout << "dx30 = " << dx30 << endl;

    mpreal sum30(0.0);
    unsigned long kmax30 = pow(2,30);
    x = 0.0;
    for(unsigned long k = 0; k <= kmax30; k++)
    {
        if ((k % 1000) == 0) 
        {
            printf("\r%lu",k); 
            fflush(stdout);
        }
            
        sum30 += f(x);
        x += dx30;
    }
    cout << endl;
    
    sum30 = sum30 - (f(0.0) + f(1.0))/2.0;
    
    mpreal trap30 = sum30*dx30;
    mpreal errort30 = trap30 - exact;
    
    cout << "trap30 = " << trap30 << endl;
    cout << "errort30 = " << errort30 << endl;
    
    // -------------------------------------------------------------------------
    
    cout << "---------------------------------------------" << endl;
    cout << "exact    = " << exact << endl;
    cout << "trap15   = " << trap15 << endl;
    cout << "trap30   = " << trap30 << endl;
    cout << "errort15 = " << errort15 << endl;
    cout << "errort30 = " << errort30 << endl;
    

    return 0;
}

The mpreal.h file should be in the same directory.

The build instruction on cygwin Linux is:

g++ -Wall -D_GNU_SOURCE -D__LINUX__ -std=c++11 -O3 traptest128bits.cpp -lmpfr -lgmp -o traptest128bits