Probability that mean is larger than median

249 Views Asked by At

Let $X_1,\ldots,X_n$ be i.i.d random variables taking values in $\mathbb{R}$. Suppose that $n$ is odd and the $X_i$ follow a continuous distribution. I am interested in the probability that the mean of these random variables is larger than their median, i.e. $$ \mathbb{P}\left[ \frac{1}{n}\sum_{i=1}^nX_i>X_{\left(\frac{n+1}{2}\right)} \right] $$ My guess is that in general, there is no nice formula for this probability. But I wonder: Are there (easy or known) special cases, where there is a (nice) formula?

1

There are 1 best solutions below

2
On

The probability can be calculated whenever the distribution of $X_1,\dots,X_n$ is absolutely continuous (we have a PDF).

Note that we can rewrite the probability as $$ \mathbb{P}\left[\frac{1}{n}\sum_{i=1}^nX_i>X_{\left(\frac{n+1}{2}\right)}\right]= \mathbb{P}\left[Z_n:=\sum_{i=1}^nX_{(i)}-nX_{\left(\frac{n+1}{2}\right)}>0\right]. $$ $Z_n$ is a linear combination of order statistics, where the weights differ for even and odd $n$ considering the definition of $X_{\left(\frac{n+1}{2}\right)}$. For absolutely continuous distribution, the joint distribution of all order statics is explicitly known, so the distribution of $Z_n$ can be determined. The probability can be computed by integration.

Symmetric distributions

For the uniform distribution on [0,1], a closed-form formula for the CDF of an arbitrary linear combination of ordered statistics is provided in 1. For odd $n$, we have $$-Z_n=-X_{(1)}-X_{(2)} \dots +(n-1) X_{(\frac{n+1}{2})}+\dots-X_{(n)}, $$ and using (2.2) in 1, the probability is

$$\mathbb{P}\left[-Z_n\le 0\right]=1-\sum_{j=1 }^{\frac{n+1}{2}} \frac{c_j^{n-1}}{ \prod_{i \neq j}(c_j-c_i)) } $$

with

$$ c_j= j-1 \text{ for } j\in \left\{1, \dots, \frac{n+1}{2} \right\}$$

$$ c_j= j-(n+1)\text{ for }j\in \left\{\frac{n+1}{2} +1, \dots, n\right\}.$$

I computed the above for $n=3,5 \text{ and } 7$ and realized that it was always 1/2. For the uniform distribution I can prove that $-Z_n$ has a symmetric distribution. In 1, a representation of $-Z_n$ is given by

$$-Z_n \sim c_1X_1+c_2X_2+\dots+c_nX_n$$

with $c_1,\dots,c_n$ defined above. As all $X_1,\dots,X_n$ are independent and have symmetric distributions, then $-Z_n$ has a symmetric distribution. Furthermore, considering $\sum_{j=1}^{n}c_j=0$, the mean of $-Z_n$ is zero, and hence, the probability is always $\frac{1}{2}$.

I am not sure whether this can be generalized to all symmetric densities or not. Indeed, we can simply see that the sample mean and the sample median (for odd $n$) both have symmetric distributions, but it does not imply that their difference $-Z_n$ also has a symmetric distribution because they are dependent.

For any symmetric distributions, by the method presented by @Henry in a comment above, without using the distributional symmetry of $-Z_n$, it can be shown that the probability is $\frac{1}{2}$.

Large samples

Moreover, under some conditions, $Z_n$ converges to a normal distribution. In fact, for a density with finite variance, the asymptotic joint distribution of sample mean and a sample quantile follows a bivariate normal distribution (see 2 for more details).

Hence, as $n \rightarrow \infty$ the probability converges to

$$ \Phi \left(\frac{\mu-m}{\sqrt{\frac{1}{n} \left(\sigma^2+\frac{1}{4f(m)^2}-\frac{\mathbb{E} \left(|X-m| \right)}{f(m)} \right)}} \right) $$

where $\Phi$ is the CDF of standard normal and where $f$, $\mu$, $m$, and $\sigma^2$ are the density, mean, median, and variance of the distribution of $X$, respectively.