Randomly take $51$ numbers from the set 1, 2, ..., 159. Find variance of their sum.

116 Views Asked by At

We randomly take $51$ numbers from 159 natural numbers $1,...,159$ without replacement. Let $\alpha$ be a random variable equal to the sum of the selected numbers. Find the variance of $\alpha$.

Firstly I need to understand something about $\alpha$ destribution. There are totally $$C^{51}_{159} = \frac{159!}{51!108!}$$ kinds of sums. A lot of them are equal, because $$\sum_{i=1}^{51}i = 1326\leq\alpha\leq\sum_{i=109}^{159}i=6834$$ Consequenlty, I want to know how many subsets of $51$ numbers have the sum equal to $N$, where $1362\leq N\leq6834$. I'm stuck here because I don't know how to do it.

2

There are 2 best solutions below

0
On

Comment: You can get a reasonable approximation to $Var(\alpha)$ by simulation. In the simulation, I assume the 51 numbers are selected without replacement.

set.seed(2020)
alpha = replicate(10^5, sum(sample(1:159, 51)))
summary(alpha)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2915    3897    4081    4081    4266    5275 

Notice that among the 100,000 samples I summed, all of the totals are between the two numbers you mention in your question.

var(alpha)
[1] 74069.39
sd(alpha)
[1] 272.1569

A histogram of the simulated values of $\alpha$ looks approximately normal, so I show the best-fitting normal density along wit the histogram.

enter image description here

hist(alpha, prob=T, col="skyblue2")
 curve(dnorm(x, mean(alpha), sd(alpha)), add=T, col="red")

With replacement, the variance is somewhat larger. (Again here the distribution of $\alpha$ seems approximately normal; histogram not shown.)

set.seed(1130)
alpha = replicate(10^6, sum(sample(1:159, 51, rep=T)))
summary(alpha)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2593    3859    4080    4080    4302    5590 
var(alpha)
[1] 107274.7

Possible solution: If you consider the population to be numbers 1 through 159, then the population has variance 2120, and the sum of a random sample with replacement should have variance 51 times as large, which is 108,120, which seems to agree with the simulated result within the margin of simulation error.

var(1:159)
[1] 2120
51*var(1:159)
[1] 108120
2
On

Replace 51 and 159 with $n, M$ respectively. We have a vector $\mathbf{x}_{n\times 1}$ which follows a multivariate distribution, and $\alpha = \sum_{i=1}^n x_i$ where $x_i$ is the $i^{th}$ component of $\mathbf x$.

Then, by symmetry, $E(\alpha)=E(\sum x_i)=\sum_i E(x_i) =nE(x_1)= \frac{n(M+1)}{2}$.

$$E(\alpha^2)=E\left(\sum_i x_i\right)^2 = E\left(\sum_i x_i^2\right)+E\left(\sum_{i\neq j} x_i x_j \right)$$

Again by symmetry $$ E\left(\sum_i x_i^2\right)=nE(x_1^2)=\frac 16 n(M+1)(2M+1) $$

$$ E\left(\sum_{i\neq j} x_i x_j \right)=(n^2-n)E(x_1 x_2)=\frac{n^2-n}{M^2-M}\sum_{i\ne j}ij = \frac{n^2-n}{M^2-M}\left(\left(\frac{M(M+1)}{2}\right)^2 - \frac{M(M+1)(2M+1)}{6}\right) \\= \frac{1}{12} (n^2-n)(M+1)(3M+2) $$

Therefore $$\text{var } \alpha = E(\alpha^2) - (E(\alpha))^2 = \cdots = 73440$$