What is the variance of the sum of a random sample of random variables?

82 Views Asked by At

Suppose we have a high-dimensional vector $\mathbf{u}\in\mathbb{R}^m$, which is drawn from a multivariate normal distribution $\mathbf{u}\sim\mathcal{N}(\mathbf{\mu}, \mathbf{\Sigma})$. To approximate the mean of the elements of $\mathbf{u}$ ($\bar{\mathbf{u}} = \frac{1}{m}\sum_{i=1}^m u_i$), we can sample $n<m$ elements of $\mathbf{u}$, sum them up, and then divide the sum by the number of samples $n$:

$$\bar{\mathbf{u}} \approx \hat{\bar{\mathbf{u}}} = \frac{1}{n} \sum_{i\in\text{sample}} u_i.$$

We can expect the approximate mean $\hat{\bar{\mathbf{u}}}$ to change if we sample a different set of $n$ elements. What is the variance of the approximate means?

Example:

$$\mathbf{u} = [1, 2, 3]^T$$

The mean of the elements of $\mathbf{u}$ is $\bar{\mathbf{u}} = \frac{1}{3}(1+2+3) = 2$. We can approximate $\bar{\mathbf{u}}$ by sampling just two elements of $\mathbf{u}$:

$$\hat{\bar{\mathbf{u}}} = \frac{1}{2}(1+2)=1.5$$

We can sample a different set of elements:

$$\hat{\bar{\mathbf{u}}} = \frac{1}{2}(1+3)=2$$

We are looking for the variance of $\hat{\bar{\mathbf{u}}}$, where $\mathbf{u}$ is a high-dimensional vector drawn from a multivariate normal distribution, $\bar{\mathbf{u}}$ is the mean of the elements of $\mathbf{u}$, and $\hat{\bar{\mathbf{u}}}$ is an estimate of $\bar{\mathbf{u}}$.

2

There are 2 best solutions below

3
On

I think you're asking what is $\text{Var}[\mathbf{\bar{u}}]=\text{Var}\left[\frac{1}{n}\sum\limits_{i=1}^{n}\mathbf{u}_i\right]$, where $\mathbf{\bar{u}}=\hat{\pmb{\mu}}$.

Can you use the following property of variance to help solve your problem: for $a \in \mathbb{R}$, $X,Y$ independent random variables, we have $\text{Var}[aX+Y]=a^{2}\text{Var}[X]+\text{Var}[Y]$?

7
On

Edit: Apparently OP is looking for the variance of $\tau = \frac{\boldsymbol x'\boldsymbol u}{\boldsymbol x'\boldsymbol 1}$, where $\boldsymbol u$ is a sample of a $m$-variate normal distribution with mean $\boldsymbol\mu$ and variance-covariance $\boldsymbol\Sigma$ and $\boldsymbol 1$ is a vector of $m$ ones.

Let $$A = \left\{ \boldsymbol x\in\boldsymbol\{0,1\}^m : \sum_{i=1}^m\boldsymbol x_i = n\right\},$$ where $n<m$ is known. We can assume that each vector is equally likely with probability $\frac{n!}{m!(n-m)!}$. We may also assume that $\boldsymbol x$ and $\boldsymbol u$ are independent.

Let's investigate the conditional mean and the conditional variance first. The conditional mean is given by $$\operatorname{E}[\tau\mid\boldsymbol x] = \operatorname{E}\left[\frac{\boldsymbol x'\boldsymbol u}{\boldsymbol x'\boldsymbol 1}\mid\boldsymbol x\right] = \operatorname{E}\left[(\boldsymbol x'\boldsymbol 1)^{-1}\boldsymbol x'\boldsymbol u\mid\boldsymbol x\right] = (\boldsymbol x'\boldsymbol 1)^{-1}\boldsymbol x'\boldsymbol\mu = \frac{\boldsymbol x'\boldsymbol\mu}{\boldsymbol x'\boldsymbol 1}.$$ The conditional variance is given by $$\operatorname{Var}[\tau\mid\boldsymbol x] = \operatorname{Var}\left[\frac{\boldsymbol x'\boldsymbol u}{\boldsymbol x'\boldsymbol 1}\mid\boldsymbol x\right] = \operatorname{Var}\left[(\boldsymbol x'\boldsymbol 1)^{-1}\boldsymbol x'\boldsymbol u\mid\boldsymbol x\right] = (\boldsymbol x'\boldsymbol 1)^{-1}\boldsymbol x'\boldsymbol\Sigma\boldsymbol x(\boldsymbol 1'\boldsymbol x)^{-1} = \frac{\boldsymbol x'\boldsymbol\Sigma\boldsymbol x}{\boldsymbol x'\boldsymbol J\boldsymbol x}$$ with $\boldsymbol J = \boldsymbol 1\boldsymbol 1'$ being an $m\times m$ matrix of ones.

Now it remains to compute $$\operatorname{Var}[\tau] = \operatorname{Var}\big[\operatorname E[\tau\mid\boldsymbol x]\big] + \operatorname E\big[\operatorname{Var}[\tau\mid\boldsymbol x]\big].$$ We have that $$\operatorname{Var}\big[\operatorname E[\tau\mid\boldsymbol x]\big] = \boldsymbol\mu'\operatorname{Var}[(\boldsymbol 1'\boldsymbol x)^{-1}\boldsymbol x]\boldsymbol\mu$$ and $$\operatorname E\big[\operatorname{Var}[\tau\mid\boldsymbol x]\big] = \operatorname E[(\boldsymbol x'\boldsymbol J\boldsymbol x)^{-1}(\boldsymbol x'\boldsymbol\Sigma\boldsymbol x)].$$ However, it seems impossible to obtain closed form expressions for the exact values...