Should density plots and medians match?

64 Views Asked by At

I have a large dataset. I split the dataset into two parts (red and blue) and charted their density plots and medians (dotted lines) using R. Even though the density plot of the blue variable is farther to the right, its median is to the left. I removed outliers (defined as any observation that is more than 3 standard deviations away from the mean).

This does not make sense to me. Is this possible or have I made a mistake somewhere?

enter image description here

1

There are 1 best solutions below

0
On

Comment continued: Here are two moderately large samples from the same normal distribution. In one sample the mean is larger than the median and in the other the median is larger than the mean. There's about a 50:50 chance that such a disparity would occur by random chance.

set.seed(123)
x1 = rnorm(1000, 50, 7)
mean(x1); median(x1)
[1] 50.1129
[1] 50.06447
x2 = rnorm(1000, 50, 7)
mean(x2); median(x2)
[1] 50.29726
[1] 50.38397

As is characteristic of moderately large normal samples, both have some boxplot outliers in both tails. Deleting those might make the difference between mean and median larger (mostly by changing the sample means).

boxplot(x1, x2, col="skyblue2", pch=20, names=T)

enter image description here

Note: With long-tailed symmetrical distributions such as Student't, Cauchy, and Laplace, the mis-match between sample means and medians might be even relatively larger. For example:

set.seed(1234)
x1 = rt(1000, 5)
mean(x1);  median(x1)
[1] 0.00247168
[1] 0.001483836
x2 = rt(1000, 5)
mean(x2);  median(x2)
[1] 0.01859106
[1] 0.06574679

boxplot(x1, x2, col="skyblue2", pch=20, names=T)

enter image description here