I am curious about the value of Simpson's rule (also called the parabolic rule or the 3-point rule) for approximating integrals. The calculus text I am now teaching from uses this rule any time an approximation is needed for an integral. For example, it may give a messy arclength integral and ask for the Simpson's rule approximation using 4 intervals (and thus 5 sample points):
$$ \int_a^{a+4h} f(x) dx \simeq \frac{h}{3}\left(f(a)+ 4f(a+h) + 2f(a+2h) + 4f(a+3h)+ f(a+4h)\right).$$
I understand the idea of Simpson's Rule. If you just sampled three evenly spaced points on a quadratic function, you could compute the integral on that interval with the weighting pattern $(1,4,1)$; this happens to give the right answer for degree 3 polynomials as well. The $(1,4,2,4,2, \ldots, 4,2,4,1)$ pattern comes from repeating this pattern over every pair of intervals.
But I'm not convinced we should always apply this rule any time we cut into $2n$ intervals. Why not just use throw out the uneven weighting and use a few more sample points? If the weighting is so helpful, why not use a more complicated weighting (like the various $n$-point rules (Newton-Cotes formulas) described here)?
The Newton-Cotes formulas and their error terms form a beautiful theory, but are probably too much for undergrad calculus! I understand showing Simpson's rule and going no further.
So I have two main questions-- is Simpson's rule so useful that calculus students should always use it for approximations? And are the other Newton-Cotes formulas (or Gaussian quadrature) always the best way to do numerical integration, or only when the values $f(x_i)$ are sufficiently expensive to compute?
The problem with Newton-Cotes methods of high order is that it inherits the same sort of problems you see with using high-order interpolating polynomials. Remember that the Newton-Cotes quadrature rules are based on integrating interpolating polynomial approximations to your function over equally spaced points.
In particular, there is the Runge phenomenon: high-order interpolating functions are in general quite oscillatory. This oscillation manifests itself in the weights of the Newton-Cotes rules: in particular, the weights of Newton-Cotes quadrature rules for 2 to 8 points and and 10 points (Simpson's is the three-point rule) are all positive, but in all the other cases, there are negative weights present. The reason for insisting on weights of the same sign for a quadrature rule is the phenomenon of subtractive cancellation, where two nearly equal quantities are subtracted, giving a result that has less significant digits. By ensuring that the all weights have the same sign, any cancellation that may occur in the computation is due to the function itself being integrated (e.g. the function has a simple zero within the integration interval) and not due to the quadrature rule.
The approach of breaking up a function into smaller intervals and applying a low-order quadrature rule like Simpson's is effectively the integration of a piecewise polynomial approximation. Since piecewise polynomials are known to have better approximation properties than interpolating polynomials, this good behavior is inherited by the quadrature method.
On the other hand, one can still salvage the interpolating polynomial approach if one no longer insists on having equally-spaced sample points. This gives rise to e.g. Gaussian and Clenshaw-Curtis quadrature rules, where the sample points are taken to be the roots of Legendre polynomials in the former, and roots (or extrema in some implementations) of Chebyshev polynomials in the latter. (Discussing these would make this answer too long, so I shall say no more about them, except that these quadrature rules tend to be more accurate than the corresponding Newton-Cotes rule for the same number of function evaluations.)
As with any tool, blind use can lead you to a heap of trouble. In particular, we know that a polynomial can never have horizontal asymptotes or vertical tangents. It stands to reason that a polynomial will be a poor approximation to a function with these features, and thus a quadrature rule based on interpolating polynomials will also behave poorly. The piecewise approach helps a bit, but not much. One should always consider a (clever?) change of variables to eliminate such features before applying a quadrature rule.