Median based on number of entries instead of values

96 Views Asked by At

I’m writing a computer program that provides some useful statistical information about files. Calculating the mean is trivial, and the mode at least has a simple definition, but the median is proving tricky. I remember its general definition from school, but there is some ambiguity.

Median: the “middle number”.

But what does that mean? Is it the middle entry or the middle value?

The Wikipedia page for mean uses the sample data set 1, 2, 2, 6, 7, 8 and gives the median as 4 because the mean of the two middle entries (2 and 6) is 4.

But what about 6? The number 6 has the same number of values above (7,8) as it does below (1,2).

This is a pretty useful statistic as well. Is there a name for this?

1

There are 1 best solutions below

8
On

The notion you are looking for is sample median (as opposed to population median).

Sort the sample values, respecting multiplicity. So if we got $7.8$ three times, we write it down $3$ times. One can use non-decreasing order or non-increasing order, it doesn't matter.

If the number of sample values is odd, say $2k+1$, then the sample median is the "middle" value, that is, the $(k+1)$-th value counting from the bottom (or top) of the sorted list.

If the number of sample values is odd, say $2k$, then the sample median is the ordinary average of the two "middle" values. the $k$-th and the $(k+1)$-th.