PMF of Sum of Values Selected Without Replacement

37 Views Asked by At

I’m currently stuck on this question.

You have $m$ values, $X_1, X_2, …, X_m$ where each value is selected without replacement from the range $[1, m+n]$ inclusive without replacement. $Y = X_1+ X_2+ \dots+ X_m$.

I understand that the expectation of Y can be found through linearity of expectation, and is given by $E[Y] = \frac{m (m + n + 1)}{2}$.

What I am struggling with is solving for the probability mass function and variance of Y. I initially tried a stars-and-bars / combinatorics approach, but was unable to work around the constraints of each $X_i$ being within $[1, m+n]$ and the fact that there are no repeats/the numbers are selected without replacement.

1

There are 1 best solutions below

1
On BEST ANSWER

I take it that you intended $Y$ to be the sum of the $X_i$. (That would lead to the expected value you give.)

I doubt that you’ll find a closed form for the probability distribution, since it’s related to partition counts, which can usually not be expressed in closed form. But we can derive a generating function and use it obtain the variance.

Let $y^k$ represent $Y=k$, and let $t^m$ label a sum of $m$ values. Then the generating function for the number of outcomes where $m$ values in $[1,m+n]$ sum to $k$ is

$$ f(y,t)=\prod_{j=1}^{m+n}\left(1+ty^j\right)\;. $$

This generating function is ordinary in $y$ but exponential in $t$, since there are $m!$ ways to order a partition into $m$ distinct parts represented by $t^m$; but that needn’t concern us here since we will only be considering quotients of coefficients for fixed $m$, so the factor $m!$ cancels.

With the coefficient extraction operator $[t^m]\sum_kc_kt^k=c_m$ the expected value of $Y$ is

$$ \mathsf E[Y] = \frac{[t^m]\left.\frac{\partial f(y,t)}{\partial y}\right|_{y=1}}{[t^m]\left.f(y,t)\right|_{y=1}}\;. $$

The denominator (the number of possible unordered outcomes) is

\begin{eqnarray*} [t^m]\left.f(y,t)\right|_{y=1} &=& [t^m]\left.\prod_{j=1}^{m+n}\left(1+ty^j\right)\right|_{y=1} \\ &=& [t^m](1+t)^{m+n} \\ &=& \binom{m+n}m\;, \end{eqnarray*}

since we can choose any $m$ of the $m+n$ values as outcomes. The numerator (the sum of all possible unordered outcomes) is

\begin{eqnarray*} [t^m]\left.\frac{\partial f(y,t)}{\partial y}\right|_{y=1} &=& [t^m]\sum_{k=1}^{m+n}\frac{tky^{k-1}}{\left(1+tyk\right)}\left.\prod_{j=1}^{m+n}\left(1+ty^j\right)\right|_{y=1} \\ &=& [t^m]t(1+t)^{m+n-1}\sum_{k=1}^{m+n}k \\ &=& \binom{m+n-1}{m-1}\binom{m+n+1}2\;. \end{eqnarray*}

Thus

\begin{eqnarray*} \mathsf E[Y] &=& \frac{\binom{m+n-1}{m-1}\binom{m+n+1}2}{\binom{m+n}m} \\ &=& \frac{m(m+n+1)}2\;, \end{eqnarray*}

in agreement with your result.

So far, I’ve merely used some unnecessarily heavy machinery to rederive a result you’d obtained more efficiently. But now it’s not much more work to obtain the variance, using

$$ \mathsf E[Y^2]-\mathsf E[Y] = \frac{[t^m]\left.\frac{\partial^2 f(y,t)}{\partial y^2}\right|_{y=1}}{[t^m]\left.f(y,t)\right|_{y=1}}\;. $$

We already evaluated the denominator; the numerator is

\begin{eqnarray*} &&[t^m]\left.\frac{\partial^2 f(y,t)}{\partial y^2}\right|_{y=1} \\ &=& [t^m]\left(\sum_{k,\ell=1}^{m+n}\frac{tk}{1+ty^k}\frac{t\ell}{1+ty^\ell}+\sum_{k=1}^{m+n}\left(\frac{tk(k-1)y^{k-2}}{1+ty^k}-\frac{t^2k^2y^{k-1}}{\left(1+ty^k\right)^2}\right)\right)\left.\prod_{j=1}^{m+n}\left(1+ty^j\right)\right|_{y=1} \\ &=& [t^m](1+t)^{m+n-2}\left(t^2\sum_{k,\ell=1}^{m+n}k\ell+\sum_{k=1}^{m+n}(t(1+t)k(k-1)-t^2k^2)\right) \\ &=& [t^m](1+t)^{m+n-2}\left(\binom{m+n+1}2^2t^2+\binom{m+n+1}2\left(\frac23(m+n-1)t-t^2\right)\right) \\ &=& \binom{m+n+1}2\left(\binom{m+n-2}{m-2}\binom{m+n+1}2+\frac23(m+n-1)\binom{m+n-2}{m-1}-\binom{m+n-2}{m-2}\right)\;. \end{eqnarray*}

Dividing by the denominator and simplifying yields

$$ \mathsf E[Y^2]-\mathsf E[Y] = \frac1{12}m(m+n+1)(3m^2+3mn+3m+n-6)\;, $$

and thus

\begin{eqnarray*} \mathsf E[Y^2] &=& \frac1{12}m(m+n+1)(3m^2+3mn+3m+n-6)+\frac{m(m+n+1)}2 \\ &=& \frac1{12}m(m+n+1)(3m^2+3mn+3m+n) \end{eqnarray*}

and

\begin{eqnarray*} \mathsf{Var}[Y] &=& \mathsf E[Y^2]-\mathsf E[Y]^2 \\ &=& \frac1{12}m(m+n+1)(3m^2+3mn+3m+n)-\left(\frac{m(m+n+1)}2\right)^2 \\ &=& \frac1{12}mn(m+n+1)\;. \end{eqnarray*}

The simplicity of the result suggests that there’s probably a more elegant way to derive it.