Consider the following population [1, 2, 2, 3, 3, 3]
Under the definition of median, it should be 2.5
I am curious why the convention is set for it to be 2.5 specifically. To me, if you were to choose a value of say, 2.2, it is still the case that half the population is less than this value, and half is more than it.
Median under a continuous distribution is easier for me to understand, if you were to integrate to the median, half your mass would lie below you and other half would lie above. However in the discrete setting, it is the case that any value between 2 and 3 will satisfy this condition.
I believe there is a deeper reason than just "it's just a convention we adopted", if you can tell me the exact reason it would be nice.
thanks in advance
Your question (+1) just deals with the tip of an iceberg:
There are many different rules for determining quantiles of data--including the sample median, which is quantile 0.5. Differences depend on whether integer or continuous data are used. Also, they may depend on the purpose for finding the median: describing the data or predicting population quantiles.
Different statistical software programs use different methods. R statistical software has a 'default' method, but allows the user to choose among several methods for compatibility with other software. This means that if you use your favorite software to check an example in a textbook, you may or may not get the same result as in the book.
Here is output from R illustrating different medians for various methods:
Randomly sample 50 observations from a normal population with mean 100, median 100. Round to two places; sort in order:
Middle two observations and their midpoint:
Showing different results from a few different methods:
Depending on the data, there may be many different numerical results. The variety of results is greater for quantiles other than the median. Here are several for the lower quartile (quantile .25):