I understand why Simpson's Rule is better than the trapezoidal rule for 3 datapoints (because under the assumption that the function is smooth, a parabolic approximation is going to be better than a piecewise linear approximation).
But suppose I had a given number N of measurements of f(x) at equally spaced intervals h where N is fairly large compared to the 3 datapoints for Simpson's Rule. Say N=61, for example.
Why is
S = h/3[f(x_0) + 4f(x_1) + 2f(x_2) + 4f(x_3) + 2f(x_4) + ... + 2f(x_56) + 4f(x_57) + 2f(x_58) + 4f(x_59) + f(x_60)]
a better approximation than
S = h [f(x_0) + f(x_1) + f(x_2) + ... + f(x_58) + f(x_59) + f(x_60)]
? I don't understand why you would weight one function datapoint with greater importance than any other.
Is there either an intuitive or an analytical approach to understand this?
Your concern about the oscillating weights of Simpson's Rule (and other higher-order Newton-Cotes formulae) has merit. I really am not aware of any significance attached to the weights. In fact, the weights can have a detrimental influence by providing more significance to certain sample points than others, especially if some of those sample points are noisy.
I like Simpson's rule and have used it often, but if at all possible I stick with a simple trapezoidal scheme. In fact, an even better scheme is Romberg integration, which computes a sequence of approximations and applies a convergence accelerant (i.e. Richardson's extrapolation) to whittle down the error.