In grouped discrete frequency distribution, Why should I subtract 0.5 from L in the formula for the estimated median?

2k Views Asked by At

I have found different sources calculating the estimated median in a discrete grouped frequency distribution differently. Some subtracted from L 0.5 in the formula:
L+((N/2–F)/f)C
And some did not.

L means lower boundary of the median class

N means sum of frequencies

F means cumulative frequency before the median class. Meaning that the class before the median class what is the frequency

f means frequency of the median class

C means the size of the median class
Here is the formula in text:
lower boundary of the median class+((sum of frequencies/2–cumulative frequency before the median class) / frequency of the median class) * size of the median class

1

There are 1 best solutions below

4
On

Suppose you saw count data as follows, and tried to estimate the median:

 0 <= x < 20:    10
20 <= x < 30:    15
30 <= x < 80:    10

If the numbers being grouped in this way were real numbers from a continuous distribution, you might intuitively guess that the median was about $25$, while if they were integers (you say "discrete") then you might guess the median was about $24.5$ (using the symmetry to go halfway between $20$ and $29$).

Your formula $L+\dfrac{\frac N2 -F}{f}C$ would suggest $20+\dfrac{\frac{35}{2}-10}{15}\times (30-20) =25$

The subtraction of $0.5$ is an attempt to handle the ranges when the numbers are integers and the bottom end of the median's range is inclusive while the top end is exclusive; if it had been the other way round then you might instead add $0.5$.

If the integer data had been presented differently, for example in either of the two following ways, there would be less need for this kind of adjustment as your formula would automatically give a lower figure for the estimate of the median, in the former case because $C$ is smaller and in the latter case because $L$ is lower. The suggested subtraction of $0.5$ is equivalent to trying to translate the original data into the last case.

 0 <= x <= 19:    10
20 <= x <= 29:    15
30 <= x <= 79:    10

or

-0.5 < x < 19.5:    10
19.5 < x < 29.5:    15
29.5 < x < 79.5:    10