Consider a piecewise linear function joining the following discrete data points of the format (x,y): (1,3),(2,5),(3,1),(4,4),(5,9)
Note that the points are evenly distributed along the x axis.
If we want the mean value of these data points we can use the formula: $$\overline{X} = \frac1n\sum_0^nf(x) = \frac{(3+5+1+4+9)}5 = 4.4$$
The continuous representation of the mean is: $$\overline{X} = \frac1{b-a}\int_{a}^{b}f(x)dx$$
Using the continuous representation, if we then take the mid point of each line and multiply by the width (i.e. 1) to find the area under the curve we get: $$\frac1{5-1}\left(\frac{3+5}2\times1 + \frac{5+1}2\times1 + \frac{1+4}2\times1 + \frac{4+9}2\times1\right) = \frac{16}4=4$$
Can anyone explain why these two methods of calculating the mean produce different results (4.4 vs 4)?
When you calculate the mean as an integral, you actually compute:
$$\overline{X}=\frac1{5-1}\left(\frac{3+5}2+\frac{5+1}2+\frac{1+4}2+\frac{4+9}2\right)=\frac{3+5+5+1+1+4+4+9}8$$
That is, you take the "middle" points twice, and the "edge" points only once.
The difference between the two means is what you try to calculate:
In the first case, $4.4$ is a value such that $\sum_n(f(n)-\overline{X})=0$, i.e. $4.4$ nicely in the middle of the 5 points.
In the second case, $4$ is a value such that the area under the "curve" $X=4$ is equal to the area of the curve of $f$. i.e. $4$ is nicely in the middle of all the points on the graph of $f$. But for the "middle" points of the graph, there is both points to the "left" of it with approximately that value, as to the "right" of it. So these points contribute twice.
Another way to think of it is as follows:
Take your graph. Now for each point $n<5$, draw a rectangle with width $\frac12$ to the right of it with height $f(n)$, and for each point $n>1$, draw a rectangle with width $\frac12$ to the left of it.
You now see that the total area of all rectangles is equal to the area under the curve.