Upper bound on variance of $\sum_{i\in I}a_i$ if $I\subset [n]$ is chosen uniformly at random with $|I|=k$ and $a\in[0,1]^n$?

48 Views Asked by At

Consider deterministic reals $a_1,...,a_n\in[0,1]$, as well as a random subset $I\subset[n]$ with uniform distribution over all ${n \choose k}$ subsets of $[n]$ with cardinality $k$. In other words, sampling without replacement.

By symmetry, for any $i\in [n]$ we have $P(i\in I)=k/n$, which gives $E[\frac{1}{k} \sum_{i\in I}a_i] = \frac{1}{n}\sum_{i=1}^n a_i$.

Assume that $k/n \ge c >0$ for some constant $c>0$.

What is the largest possible variance of $W = \frac{1}{k} \sum_{i\in I}a_i$ as a function of $c$, or alternatively a good upper bound on the variance?

1

There are 1 best solutions below

3
On

We can write $$ \sum_{i \in I} a_i = \sum_{j=1}^n 1\{j \in I\} a_j. $$ Thus, we have $$ \Big(\sum_{i \in I} a_i\Big)^2 = \sum_{j,k=1}^n 1\{j \in I\}1\{k \in I\} a_j a_k $$ Note that for $j \neq k$, we have $$ \mathbb{E}[1\{j \in I\}1\{k \in I\}] = \mathbb{P}(\{j, k\}\subset I) = \frac{\binom{n-2}{k-2}}{\binom{n}{k}} = \frac{k(k-1)}{n(n-1)} $$ Otherwise when $j = k$, $$ \mathbb{E}[1\{j \in I\}1\{k \in I\}] = \mathbb{P}(j \in I) = \frac{k}{n}$$

Therefore, the variance of $\sum_{i\in I} a_i$ is
$$ \sum_{j \neq k} \big(\tfrac{k}{n} \tfrac{k-1}{n-1} - \tfrac{k^2}{n^2}\big) a_j a_k + \sum_{j = 1}^n \big( \tfrac{k(n-k)}{n^2}\big) a_j^2 =\frac{k(n-k)}{n^2}\Big(\tfrac{n-2}{n-1}\|a\|_2^2 +\tfrac{1}{n-1} \|a\|_1^2 \Big)$$