The mathematical expression describing the rank of the median of a distribution of N observations is:
- for a list of raw data values, (N+1)/2;
- for an ungrouped frequency distribution, typically written as a table, (N+1)/2 according to some sources, N/2 according to others that I have come across;
- for a grouped frequency distribution, N/2, according to all the sources I've looked at ... and in the case of a cumulative frequency graph, the median value on the horizontal axis is accordingly read off via a cumulative frequency value exactly half way down the vertical axis.
The N/2, it is indicated, is the number of observations BELOW the median (which makes sense for an even number of observations, and kind of makes sense to me for an odd number).
So the formula that I have always seen quoted for calculating the value of the median is $\displaystyle L_m + \left [ \frac { \frac{N}{2} - F_{m-1} }{f_m} \right ] \cdot c$
(where $F_{m-1}$ is the cumulative frequency of all classes below the median class, $f_{m}$ is the frequency of the median class, $L_{m}$ is the lower limit of the median class, $c$ is the width of the median class).
But surely this only gives the value of the data point immediately below the median (or half the way down from the median to the nearest data point for odd N)? A more accurate formula - for both ungrouped and grouped distributions - would replace the N/2 with (N+1)/2, giving $\displaystyle L_m + \left [ \frac { \frac{N+1}{2} - F_{m-1} }{f_m} \right ] \cdot c$. The diffence is negligible for large data sets, but significant for very small ones.