Variance of sum of $k$ randomly drawn numbers from $1,...,n$.

1.5k Views Asked by At

From an urn with numbers $1,...,n$ we draw $k < n$ numbers without replacement.

Let $X_i$ be the $i$-th draw. The random variable is their sum $X=\sum_{i=1}^kX_i$.

I have already calculated the expected value of the sum, which is

$$\Bbb{E}[X]=\sum_{i=1}^k\Bbb{E}[X_i]=k\frac{n+1}{2}$$ because each $\Bbb{E}[X_i]=\frac{1}{n}\sum_{i=1}^n i=\frac{n+1}{2}$.

Now the variance of the sum would be $$Var[X]=\Bbb{E}[X^2]-\Bbb{E}[X]^2$$

I have read that the variance of a sum is the sum of variances if the random variables are independent, it does not seem to be the case here, as previous draws determine future draws.

Is there an elegant way to determine the first summand of the variance?


Edit: I am trying it the ugly way.

$\Bbb{E}[X^2]=\Bbb{E}[(\sum_{i=1}^kX_i)^2]=\Bbb{E}[\sum_{i=1}^k \sum_{j=1}^k X_iX_j]=\sum_{i=1}^k \sum_{j=1}^k \Bbb{E}[X_iX_j]$

To know $\Bbb{E}[X_iX_j]$ we would have to know $\Bbb{P}(X_iX_j=k)$, meaning we would have to know the number of ways to write a number as the product of two factors $1\leq X_i, X_j \leq n$... I am pretty sure I am off the track here, as I do not see a way to do it for a general $n$.


Am I wrong to regard the $X_i$ instead of the $X$, which are independent, as two draws of $k$ balls would be independent? Then $\Bbb{E}[X^2]=\Bbb{E}[X]\Bbb{E}[X]$

3

There are 3 best solutions below

4
On BEST ANSWER

Let's do it the ugly way. If any of the steps is confusing, let me know in the comments, I'll elaborate.

You have $$\mathbb{E}[X^2] = \sum_{i=1}^k \sum_{j=1}^k \mathbb{E}[X_iX_j] = \sum_{i=1}^k \mathbb{E}[X_i^2]+2\sum_{1\leq i < j\leq k} \mathbb{E}[X_iX_j]$$

The first term is easy to compute: $$ \sum_{i=1}^k \mathbb{E}[X_i^2] = k\cdot \frac{1}{n}\sum_{i=1}^n i^2 = \frac{k(n+1)(2n+1)}{6}\,. $$ The second... is similar. $$\begin{align*} 2\sum_{1\leq i < j\leq k} \mathbb{E}[X_iX_j] &= \binom{k}{2}\cdot \frac{1}{\binom{n}{2}} \sum_{\substack{1\leq i,j\leq n\\ i\neq j}} ij\\ &= \frac{k(k-1)}{n(n-1)}\left( \sum_{1\leq i,j\leq n} ij-\sum_{1\leq i\leq n} i^2 \right) \tag{Can you see why?}\\ &= \frac{k(k-1)}{n(n-1)}\left( \left(\sum_{i=1}^n i\right)^2-\sum_{i=1}^n i^2 \right) \tag{Can you see why?}\\ &= \frac{k(k-1)}{n(n-1)}\left( \left(\frac{n(n+1)}{2}\right)^2-\frac{n(n+1)(2n+1)}{6} \right) \\ &= \frac{k(k-1)}{n(n-1)}\left( \frac{n(n+1)(3n^2-n-2)}{12} \right) \end{align*}$$ so $$\begin{align} \mathbb{E}[X^2] - \mathbb{E}[X]^2 &= \frac{k(n+1)(2n+1)}{6} + \frac{k(k-1)(n+1)(3n^2-n-2)}{12(n-1)} - \frac{k^2(n+1)^2}{4}\\ &= \boxed{\frac{k(n-k)(n+1)}{12}} \end{align}$$

Sanity checks: the obtained expression is non-negative (good: it's a variance), and equal to $0$ for $k=n$ (good, this makes sense: if we decide to draw all the numbers, the sum is fixed). Moreover, for $k=1$, we do get $(n^2-1)/12$, which is indeed the variance of a uniform r.v. on $\{1,2,\dots,n\}$.

0
On

There is a more elegant way of proving this. We are trying to evaluate the expectation, $$ \mathbb{E}[X^2]=\sum_{i=1}^{k} \sum_{j=1}^k \mathbb{E}[X_i X_j] $$ We know that $\mathbb{E}[X_i]=(N+1)/2$. But given the number $X_j$ drawn on the $j$th draw the conditional expectation for $X_i$ is, $$ \mathbb{E}[X_i | X_j]=\frac{1}{N-1}\left( \sum_{X_i=1}^{N} X_i - X_j \right)=\frac{1}{N-1}\left( \frac{N(N+1)}{2} - X_j \right) $$ (After number $X_j$ is drawn there are $N-1$ numbers left.) If we now take the second expectation w.r.t. $X_j$ we get, $$ \mathbb{E}[\mathbb{E}[X_i | X_j]X_j]=\frac{1}{N-1}\left( \frac{N(N+1)}{2}\mathbb{E}[X_j] - \mathbb{E}[X_j^2] \right) $$ with, $$ \mathbb{E}[X_j^2]=\frac{1}{N} \sum_{n=1}^{N} n^2 = \frac{(N+1)(2N+1)}{6} $$ So when $i\neq j$, $$ \mathbb{E}[X_i X_j]=\frac{1}{N-1}\left( \frac{N(N+1)^2}{4} - \frac{(N+1)(2N+1)}{6} \right)=\frac{(N+1)(3N^2-N-2)}{12(N-1)} $$ When $i=j$, $$ \mathbb{E}[X_i^2]= \frac{(N+1)(2N+1)}{6} $$ The sum $\sum_{j=1}^k X_i X_j$ will have one term with $j=i$ and $k-1$ terms with $j \neq i$. Therefore, $$ \sum_{j=1}^{k} \mathbb{E}[X_i X_j] =\frac{(N+1)(2N+1)}{6}+\frac{(k-1)(N+1)(3N^2-N-2)}{12(N-1)} $$ If we now sum over the first index $i$ we get, $$ \sum_{i=1}^{k}\sum_{j=1}^{k} \mathbb{E}[X_i X_j]=k \sum_{j=1}^{k} \mathbb{E}[X_i X_j]=\frac{k(N+1)(2N+1)}{6}+\frac{k(k-1)(N+1)(3N^2-N-2)}{12(N-1)} $$ which is the same expression for $\mathbb{E}[X^2]$ as derived in the answer above.

0
On

It's enough to compute the variance of a single draw. Then apply the following general formula, proved here:

Let $X_1, X_2,\ldots,X_k$ be drawn at random without replacement from a finite population of $n$ items, and let $S_k:=X_1+\cdots+X_k$ be their sum. Then $$\operatorname{Var}(S_k)=k\left(\frac{n-k}{n-1}\right)\sigma^2,$$ where $\sigma^2$ is the variance of a single draw.

For your situation, the population is the list $1,2,\ldots,n$ and a single draw $X$ from this population has distribution $P(X=i)=\frac1n$ for $i=1,\ldots,n$. So calculate $$E(X)=\sum_{i=1}^n i\,P(X=i)\stackrel{(1)}=\frac1n\frac{n(n+1)}2=\frac{n+1}2$$ $$E(X^2)=\sum_{i=1}^ni^2P(X=i)\stackrel{(2)}=\frac1n\frac{n(n+1)(2n+1)}6=\frac{(n+1)(2n+1)}6$$ where step (1) uses the identity for $\sum_{i=1}^ni$ and step (2) uses the identity for $\sum_{i=1}^n i^2$. Finally, after some algebra, conclude the variance of a single draw is $$\operatorname{Var}(X)=E(X^2)-[E(X)]^2=\frac{n^2-1}{12}.$$