Probability distribution of the median

451 Views Asked by At

Suppose $X_1, \dots, X_n$ are $n = 2m+1$ iid discrete random variables. I am wondering how I can calculate the probability distribution of the random variable $$\text{median}(X_1, \dots, X_n)$$ My attempt involved the following reasoning. Suppose that the median turns out to be the value $k$. Then we have to separate the $n$ samples into three different groups: $m$ samples must be $\leq k$, $m$ samples must be $\geq k$, and $1$ sample must equal $k$. Thus, we would obtain the following probability distribution: $$\mathbb{P}(\text{median}(X_1, \dots, X_n) = k) = {n \choose m \; m \; 1} \cdot \mathbb{P}(X_1 \leq k)^m \cdot \mathbb{P}(X_1=k) \cdot \mathbb{P}(X_1 \geq k)^m$$ However, numerical analysis shows that this is wrong; it does not even sum to $1$. Where is the mistake in my reasoning? And is there any way to successfully describe the probability distribution of the random variable $\text{median}(X_1, \dots, X_n)$?

2

There are 2 best solutions below

0
On BEST ANSWER

You are definitely on the right track. The problem is those stupid situations where multiple $X's$ take the median value. You haven't accounted for those correctly.

It is probably easiest to work with the CDF. The probability that the median is $\le k$ is the probability that at least $m+1$ of the $X$'s are $\le k.$ The combinatorics of this are more straightforward: $$ P(\operatorname{Med}(X_i) \le k ) =\sum_{i=m+1}^{n} {n\choose i} P(X\le k)^{i}P(X>k)^{n-i}.$$ Then to get $P(\operatorname{Med}(X_i)=k)$ you can take $P(\operatorname{Med}(X_i)\le k) - P(\operatorname{Med}(X_i)\le k-1).$

0
On

\begin{align*} F_X &= P(X_{(k)}<z)\\ &= P ( kX_k<z)\cup \ldots \cup P(nX_k<z)\\ &= \sum_{j=k}^n P(~jX_i<z)\\ &= \sum_{j=k}^n {n\choose k} F(z)^j(1-F(z))^{n-j} \end{align*}