Probability of a median being $x_i$

101 Views Asked by At

I am trying to derive a probability function and here are the assumptions.

  1. Let $S = \{x_{(1)},x_{(2)},...,x_{(7)}\}$ be a set of distinct values that are ordered.
  2. Let $S*=\{x_1^*,x_2^*,...,x_7^*\}$ be a replication of $S$, as in, each elements $x_i^*$ are randomly chosen from $S$ with equal probability.

ex), $S^*$ could be $\{x_1,...,x_1\}$, or it could be an exact replica of $S$, etc.

  1. The sampling process are mutually independent.

My goal is to find the probability that the median of $S*$, say, $M(S*)$, is equal to $x_{(i)}$.

I would like to use the notation

$$B\left[j;n,p\right] ={n \choose j}p^j(1-p)^{n-j}$$

associated to the binomial distribution.

Here is my thought process.

To find $p(i)=Pr[M(S*)=x_{(1)}]$ we need at least 4 elements from $S*$ to be $x_{(1)}$ so

$$p(1) = \sum_{j=4}^7 B\left[j;7,\frac{1}{7}\right]$$

in other words,

$$\therefore = \sum_{j=0}^3 B\left[j;7,\frac{6}{7}\right]$$.

To find $p(2)$ we need at least the least 4 elements to be less than or equal to $x_{(2)}$, but take away the probability $p(1)$, so with similar argument I can get

$$p(2) = \sum_{j=0}^3 B\left[j;7,\frac{5}{7}\right]-p(1)$$

I am hoping that the general case would look something like

$$p(i) = \sum_{j=0}^3 \left(B\left[j;7,\frac{i-1}{7}\right]-B\left[j;7,\frac{i}{7}\right]\right)$$

but I am not too confident...

May I have some assistance please?

A different result is also more than welcome as long as it is accurate and makes sense.

2

There are 2 best solutions below

5
On BEST ANSWER

There are $7^7=823543$ equally likely possibilities for the resampling with replacement.

This is a tractable number and it is possible just to count how many give which order statistic as the median:

 x_(1)  x_(2)  x_(3)  x_(4)  x_(5)  x_(6)  x_(7) 
  8359  80809 196519 252169 196519  80809   8359   

and so probabilities

     x_(1)      x_(2)      x_(3)      x_(4)      x_(5)      x_(6)      x_(7) 
0.01015005 0.09812360 0.23862628 0.30620016 0.23862628 0.09812360 0.01015005

If you wanted this analytically, you want the probability that at least half of the sample is less than or equal to $x_{(n)}$ but are not less than or equal to $x_{(n-1)}$, i.e. $$\sum\limits_{j=4}^7 {7 \choose j}\frac{n^j (7-n)^{7-j}}{7^7}- \sum\limits_{j=4}^7 {7 \choose j}\frac{(n-1)^j (8-n)^{7-j}}{7^7} \\= \frac1{7^7}\sum\limits_{j=4}^7 {7 \choose j}\left({n^j (7-n)^{7-j}}- {(n-1)^j (8-n)^{7-j}}\right)$$

or using R

n <- 1:7
(1-pbinom(3,7,n/7)) - (1-pbinom(3,7,(n-1)/7))
# 0.01015005 0.09812360 0.23862628 0.30620016 0.23862628 0.09812360 0.01015005

as already found

0
On

Let $F(k,n,p)$ denote the binomial CDF.

To keep things general, let's use $n$ as the (odd) number of elements instead of $7$. Then, $$ p(i)=F((n - 1) / 2, n, 1 - i / n) - \sum_{j < i} p(j). $$ In plain English, the probability of the median being $i$ is the probability of resampling a number greater than $x_{(i)}$ at most $(n - 1) / 2$ times minus the probability of the median being any one the previous elements.

The script (below) also gives the same answer as that of @Henry. Note that $p(i) = p(n - i + 1)$ whenever $i < (n - 1) / 2$ and so you only have to compute the first $(n + 1) / 2$ elements of $p$.

from scipy.stats import binom
import numpy as np


def probs(n):
    assert n % 2 == 1  # Make sure n is odd
    p = np.empty([n + 1])
    p[0] = 0.
    mid = (n + 1) // 2
    cumsum = 0.
    for i in range(1, mid + 1):
        p[i] = binom.cdf(mid - 1, n, 1. - float(i) / n) - cumsum
        cumsum += p[i]
    p[mid + 1:] = p[mid - 1:0:-1]
    return p


n = 7
p = probs(n)
for i in range(1, n + 1):
    print('p({}) = {}'.format(i, p[i]))

# p(1) = 0.010150046809941905
# p(2) = 0.0981235952463927
# p(3) = 0.23862627695214456
# p(4) = 0.3062001619830417
# p(5) = 0.23862627695214456
# p(6) = 0.09812359524639269
# p(7) = 0.010150046809941915