Median Based Probability

161 Views Asked by At

Most of probability theory is formulated in what I would call the "expectation framework". In general, we are interested in quantities involving $\mathbb{E}\left[X\right]$ where $X$ is some random variable of interest. This of course is reasonable - with the use of expectation, we can reframe probability theory into measure theory/Lebesgue theory on a space of measure one. Expectation is convenient - first and foremost it is linear, and its ability to generate norms allows us to invoke things like $\mathcal{L}^p$ and $\mathcal{L}^2$ theory, and sometimes even general Banach space theory.

However, elementary statistics courses (leaning away from probability theory for a bit) often criticize the expectation for its ability to be somewhat misleading as a measure of central tendency. It (simply by definition of course) can be influenced by the presence of outliers and large observations. It fails to exist for certain distribution (heavy-tailed laws) and as a result, many useful convergence theorems available to us fail to apply for these laws. Of course, I am sure expectation has further downsides (and upsides) than I have mentioned here.

Question: Is it possible to formulate a coherent notion of probability theory where all results that involve expectation are replaced with median? Has this ever been attempted? Or would such a theory be equivalent to the current standard formulation of probability theory (say, through the use of various concentration inequalities), and will I feel silly minutes after asking this?

1

There are 1 best solutions below

0
On

There are some distributions for which the sample median is the best estimate of the center. One of them is the Laplace distribution. Another is Cauchy, for which the population mean does not exist.

There is a 'Central Limit Theorem' for sample medians. Provided that the density function $f(x)$ of the population is positive at the population median $\eta$ (that is, $f(\eta)>0),$ the sample median $\tilde X$ tends toward a normal distribution with increasing sample size. (See simulation below.)

So the median is not ignored in traditional statistics. Moreover, many parts of statistics, including robust and nonparametric methods make use of medians. However, there are theoretical and practical difficulties with medians. Depending on the sample size (even or odd) or configuration (as in @JohnWhite's Comment), the definition of the sample median may not be unique. Also, the mean of the difference of two samples is the same as the difference of the means, but the same is not true of medians.

x = c(1,2,3,10,11); y = c(15,12,2,3,1)
mean(x) - mean(y);  mean(x-y)
[1] -1.2
[1] -1.2
median(x)-median(y); median(x-y)
[1] 0
[1] 1

The following simulation of a million samples of size $n=200$ from a (highly skewed) exponential population shows that a histogram of the million sample medians is nearly normal.

set.seed(2020)
h = replicate(10^6, median(rexp(500)))
hist(h, prob=T, br=100, col="skyblue2", 
     main="Sample Medians")
 curve(dnorm(x, mean(h), sd(h)), add=T, col="red", lwd=2)

enter image description here