Median of a multinomial variable

377 Views Asked by At

Let $k\in\mathbb N^+$ be a positive integer.

Consider a set of i.i.d. random variables $X_1,X_2,\ldots, X_n$, each of which is distributed uniformly over $\{1,2,\ldots,2k+1\}$.

For $i\in \{1,2,\ldots,2k+1\}$, let $Y_i\triangleq|\{j\mid X_j=i\}|$ denote the number of variables with the value $i$. Notice that each $Y_i$ is distributed $Bin\left(n,\frac{1}{2k+1}\right)$.

Finally, let $Z\triangleq \text{Median}(\{Y_1,Y_2,\ldots,Y_{2k+1}\})$ be the median of the $Y_i$ variables.

Does $\mathbb E(Z) = \frac{n}{2k+1}$? How could we prove it?

and more importantly:

What is the variance of $Z$? is it $o(np(1-p))$, where $p\triangleq\frac{1}{2k+1}$ is the probability that a single $X_j$ takes a specific value $i$? i.e., is the variance of the median asymptotically smaller than $Var(Y_i)=np(1-p)$? by how much?


The special case of $k=1$, where we have only 3 $Y_i$ variables is also interesting for me.

1

There are 1 best solutions below

4
On BEST ANSWER

No, we don't generally have $\mathbb E(Z) = \frac{n}{2k+1}$. For instance, for $n\le k$ we have $Z=0$ with probability $1$.

To find the expected median for large $n$ for $k=1$, with $3$ bins, we can expand the multinomial coefficients around $m=\frac n3$ using Stirling's approximation:

\begin{align} &\binom n{m-x,m-y,m+x+y}\\ ={}&\frac{n!}{(m-x)!(m-y)!(m+x+y)!} \\\propto{}&\exp(-(m-x)\log(m-x)-(m-y)\log(m-y)-(m+x+y)\log(m+x+y) \\ \propto{}& \exp\left(-\frac1{2m}\left(x^2+y^2+(x+y)^2\right)-\frac1{6m^2}\left(x^3+y^3-(x+y)^3\right)\right)\;, \end{align}

where terms independent of $x$ and $y$ and terms of order $\frac{x^2}{m^2}$, $\frac{y^2}{m^2}$, $\frac{x^4}{m^3}$ and $\frac{y^4}{m^3}$ have been dropped. For $n\to\infty$, we can regard this as a continuous density $f(x,y)$. The probability that two $Y_i$ are equal goes to zero, so we don't have to include symmetry factors, and the expected median is

$$ \mathbb E(m-y)=m-\frac{\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,y\,f(x,y)}{\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,f(x,y)}\;. $$

The $\frac1m$-fold denominator is

\begin{align} &\frac1m\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,\exp\left(-\frac1{2m}\left(x^2+y^2+(x+y)^2\right)-\frac1{6m^2}\left(x^3+y^3-(x+y)^3\right)\right)\\ ={}&\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,\exp\left(-\frac12\left(x^2+y^2+(x+y)^2\right)-\frac1{6m^{\frac12}}\left(x^3+y^3-(x+y)^3\right)\right)\\ ={}&\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,\exp\left(-\frac12\left(x^2+y^2+(x+y)^2\right)\right)+O\left(n^{-\frac12}\right)\\ ={}&\int_{\arctan\left(-\frac12\right)}^\frac\pi4\mathrm d\phi\int_0^\infty r\mathrm dr\,\exp\left(-r^2\left(1+\sin\phi\cos\phi\right)\right)+O\left(n^{-\frac12}\right)\\ ={}&\int_{\arctan\left(-\frac12\right)}^\frac\pi4\mathrm d\phi\,\frac12\frac1{1+\sin\phi\cos\phi}+O\left(n^{-\frac12}\right)\\ ={}&\frac\pi{3\sqrt3}+O\left(n^{-\frac12}\right) \end{align}

(Wolfram|Alpha computation for the angular integral).

The $\frac1m$-fold numerator is

\begin{align} &\frac1m\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,y\exp\left(-\frac1{2m}\left(x^2+y^2+(x+y)^2\right)-\frac1{6m^2}\left(x^3+y^3-(x+y)^3\right)\right)\\ ={}&m^\frac12\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,y\exp\left(-\frac12\left(x^2+y^2+(x+y)^2\right)-\frac1{6m^{\frac12}}\left(x^3+y^3-(x+y)^3\right)\right)\\ ={}&\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,y\left(m^\frac12-\frac16\left(x^3+y^3-(x+y)^3\right)\right)\exp\left(-\frac12\left(x^2+y^2+(x+y)^2\right)\right)+O\left(n^{-\frac12}\right)\;.\\ \end{align}

The first term, proportional to $n^\frac12$, vanishes:

\begin{align} &\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,y\exp\left(-\frac12\left(x^2+y^2+(x+y)^2\right)\right)\\ ={}&\int_{\arctan\left(-\frac12\right)}^\frac\pi4\mathrm d\phi\int_0^\infty r\mathrm dr\,r\sin\phi\exp\left(-r^2\left(1+\sin\phi\cos\phi\right)\right) \\ ={}&\int_{\arctan\left(-\frac12\right)}^\frac\pi4\mathrm d\phi\sin\phi\frac{\sqrt\pi}{4(1+\sin\phi\cos\phi)^\frac32}\\ ={}&0 \end{align}

(Wolfram|Alpha computation for the angular integral).

The second, constant term is finite:

\begin{align} &-\frac16\int_0^\infty\mathrm dx\int_{-x/2}^x\mathrm dy\,y\left(x^3+y^3-(x+y)^3\right)\exp\left(-\frac12\left(x^2+y^2+(x+y)^2\right)\right)\\ ={}&-\frac16\int_{\arctan\left(-\frac12\right)}^\frac\pi4\mathrm d\phi\int_0^\infty r\mathrm dr\,r^4\sin\phi\left(\cos^3\phi+\sin^3\phi-(\cos\phi+\sin\phi)^3\right)\quad\exp\left(-r^2\left(1+\sin\phi\cos\phi\right)\right) \\ ={}&-\frac16\int_{\arctan\left(-\frac12\right)}^\frac\pi4\mathrm d\phi\frac{\sin\phi\left(\cos^3\phi+\sin^3\phi-(\cos\phi+\sin\phi)^3\right)}{\left(1+\sin\phi\cos\phi\right)^3} \\ ={}&\frac1{18} \end{align}

(Wolfram|Alpha computation for the angular integral).

You can check that the contribution from the terms dropped in the initial approximation is also $O\left(n^{-\frac12}\right)$, so the expected median is

\begin{align} &m-\frac{\frac1{18}}{\frac\pi{3\sqrt3}}+O\left(n^{-\frac12}\right)\\ ={}&\frac n3-\frac1{2\pi\sqrt3}+O\left(n^{-\frac12}\right)\;. \end{align}

Here's code that I used to check this result. The numerical results indicate that the shift in the expected median slightly increases with $k$, but not much. It would be interesting to see whether it has a limit for $k\to\infty$, but I don't see how to determine it other than with prohibitive integrations using hyperspherical coordinates in $2k$ dimensions in analogy with the above calculation.