Let's say I have a set of test scores from 20 applicants
$5, 15, 16, 20, 21, 25, 26, 27, 30, 30, 31, 32, 32, 34, 35, 38, 38, 41, 43, 66$
Obviously, the median of this set would be $30$ and $31$, so I would just get the average, though that's not what I'm looking for.
What I'm really looking for is the median of the lower and upper quartiles.
I have found 2 ways, both yielding different answers.
1st way: Finding the lower quartile, the range would be $5$ to $30$, and the median of the set would be $21$. This is based on a video that I watched.
2nd way: Same problem, but instead of just finding the median, I use a formula.
$x_{kth}=(\frac{k}{4})n=\frac{1}{4}(20)=5$
This means the median for the lower quartile is the 5th spot, which is $21$. Sounds good, except that I have to do an extra step.
Get the average of the values $x_k$ and $x_{k+1}$ positions.
This means I have to average the 5th and 6th positions.
$Q_1=\frac{5_{th}+6_{th}}{2}=\frac{21+25}{2}=23$
This is entirely different from $21$. This method came from a textbook.
Which way is correct?
Different textbooks and brands of software have various methods of finding quartiles (and other quantiles).
If you have 40 distinct observations, sorted in order, then it is clear that the lower quartile should be between the 10th and 11th values. But if the sample size is not evenly divisible by 4, or if there are ties, the compromises must be made, and there is no general agreement exactly how to do that. The different methods may make a noticeable difference in small samples, but usually not in large samples.
The criteria for 'identifying' outliers are based on the quartiles, so moving from one software package to another, you may see differences in which observations are so identified.
Quantiles for your data, of types 7, 3, and 5 from R, are shown below. The last item shows Tukey's 'five number summary', which agrees with
type 5for your data. (Your data have no ties in the vicinity of quartiles, so some major disagreements among types are not illustrated.)