Variance of a function of independent random variables

662 Views Asked by At

Suppose I have two discrete independant random variables $X$ and $Y$, and that I'm interested in the expected value of the random variable $W$, where: $$ W= \text{sign}(X-Y). $$ So, W is 1 if $X>Y$, -1 if $Y>X$ and 0 otherwise.

I sample the distributions of $X$ and $Y$ ten times each, giving me $\{X_1, \dots, X_{10}\}$ and $\{Y_1, \dots, Y_{10}\}$.
Consider these two ways to estimate $\text{E}\{W\}$ $$ \quad\quad\bar{W} = \frac{1}{10}\sum_{i=1}^{10} W_{i,i}, \\ \text{and, } \quad\quad \bar{W}' = \frac{1}{100}\sum_{i=1}^{10}\sum_{j=1}^{10} W_{i,j}, \\ \text{where } \quad W_{i,j} = \text{sign}(X_i - Y_j) $$ I know that $\text{Var}\{\bar{W}\} = \frac{1}{10}\text{Var}\{W\}$, but what is $\text{Var}\{\bar{W}'\}$, and how can I estimate it from my 20 samples?

2

There are 2 best solutions below

5
On BEST ANSWER

$\def\sign{\mathop{\mathrm{sign}}} \def\Cov{\mathop{\mathrm{Cov}}} $You have an empirical distribution $P$ of $X$ and $Y$, with the probability mass function $p(x,y)=\frac1{mn}\sum_{i,j}[x=x_i][y=y_j]$, and your estimate of $\mathbb{E}[\sign(X-Y)]$ is $$ \mathbb{E}^P[\sign(X-Y)] = \frac1{mn}\sum_{i,j}\sign(x_i-y_j) = \bar W'. $$

You can calculate an estimate of variance in exactly the same way as $$ \mathbb{V}^P[\sign(X-Y)] = \mathbb{E}^P[\sign(X-Y)^2] - (\mathbb{E}^P[\sign(X-Y)])^2. $$ Since $\sign(a)^2=1$ when $a\neq0$, the first expectation is $$ Q = \mathbb{E}^P[\sign(X-Y)^2] = \mathbb{P}^P[X\neq Y] = \frac1{mn}\sum_{i,j}[x_i\neq y_j]. $$ So the sample variance is $$ \mathbb{V}^P[\sign(X-Y)] = Q-(\bar W')^2. $$

The variance of $\bar W'$ is calculated in the same way. Under the true probability distribution we have $$ \mathbb{V}\left[\frac1{mn}\sum_{i,j}\sign(x_i-x_j)\right] = \frac1{mn}\mathbb{V}[\sign(X-Y)] + \frac1{m^2n^2}\sum_{(i,j)\neq(k,l)}\Cov(W_{ij},W_{kl}). $$ The covariance between $W_{ij}$ and $W_{kl}$ is zero when $i\neq k$ and $k\neq l$, so the only nonzero covariances are $Q_X = \Cov(W_{ij},W_{il})$ and $Q_Y = \Cov(W_{ij},W_{kj})$, and both of these are independent of the indices. Therefore if we let $X,X_1,X_2$ and $Y,Y_1,Y_2$ be independent copies of $X$ and $Y$, then $$ Q_X = \Cov(\sign(X-Y_1),\sign(X-Y_2)), \qquad Q_Y = \Cov(\sign(X_1-Y),\sign(X_2-Y)), $$ and the variance of $\bar W'$ is $$ \mathbb{V}[\bar W'] = \frac1{mn}\mathbb{V}[\sign(X-Y)] + \frac{n-1}{mn}Q_X + \frac{m-1}{mn}Q_Y. $$

You can calculate these terms as expectations under the empirical distribution, so $Q_X$ for example, becomes: $$ \begin{aligned} Q_X &= \mathbb{E}^P[\sign(X-Y_1)^2\sign(X-Y_2)^2] - \mathbb{E}^P[\sign(X-Y_1)]\mathbb{E}^P[\sign(X-Y_2)] \\&= \frac1{mn^2}\sum_{i,j,k}[x_i\neq y_j, x_i\neq y_k] - \left( \frac1{mn}\sum_{i,j}\sign(x_i-y_j)\right)^2. \end{aligned}$$

2
On

Use the generic formula for the variance of a sum of random variables. For some random variables $A_i$, this can be stated as: $$ \text{Var}\left\{\sum_{i=1}^n A_i\right\} = \sum_{i=1}^n\text{Var}\{A_i\} + \sum_{i \neq j} \text{Cov}\{A_i,A_j\} \\ $$ In this case, we get: $$ \text{Var}\{\bar{W}'\} = \text{Var}\left\{\frac{1}{100}\sum_{k\in \{1,...,10\}^2}W_{k}\right\} \\ = \frac{1}{10000}\left(\sum_{k \in \{1,...,10\}^2}\text{Var}\{W_k\} + \sum_{k,\ell \in \{1,...,10\}^2; k\neq \ell}\text{Cov}\{W_k,W_\ell\}\right) $$ Note that above, the subscripts $k$ and $\ell$ represent pairs, i.e. $k= (i,j)$, which I'm using because the notation for the indices gets too complicated otherwise!

The trick is that $\text{Cov}\{W_k, W_\ell\}$ is not zero if one of the indicies is the same. This can happen in two different ways. Either the first index is the same --- $k = (i,j)$ and $\ell = (i,m)$ --- or the second index is the same --- $k = (i,j)$ and $\ell=(m,j)$. Otherwise $W_k$ and $W_j$ are based on different, and independent samples, and so are independent.

Therefore, you need to estimate $\text{Cov}\{W_{i,j},W_{i,m}\}$ and $\text{Cov}\{W_{i,j},W_{m,j}\}$. Note that it doesn't matter what the specific indices are --- all that matters is which of the indices are shared. (this is because all $X_i$'s are i.i.d, and all $Y_i$'s are too.)

You could estimate $\text{Cov}\{W_{i,j},W_{i,m}\}$ using your samples by calculating $\text{Cov}\{W_{i,1}, W_{i,2}\}$:

$$ \text{Cov}\{W_{i,1}, W_{i,2}\} = \frac{1}{9}\sum_{i\in\{1,...,10\}}\left(W_{i,1} - \text{E}\{W_{\cdot,1}\}\right)\left(W_{i,2} - \text{E}\{W_{\cdot,2}\}\right) $$

You could also calculate $\text{Cov}\{W_{i,3},W_{i,4}\}$ from your samples. Both $\text{Cov}\{W_{i,1}, W_{i,2}\}$, and $\text{Cov}\{W_{i,3},W_{i,4}\}$ provide independent estimates of $\text{Cov}\{W_{i,j},W_{i,m}\}$. You can refine your estimate of $\text{Cov}\{W_{i,j},W_{i,m}\}$ by taking the mean of various independent estimators. But be careful not to also include something like $\text{Cov}\{W_{i,2},W_{i,3}\}$ along with the others already mentioned, since it would not be independent from them.

Then, do similarly to estimate $\text{Cov}\{W_{i,j},W_{m,j}\}$. For example: $$ \text{Cov}\{W_{1,j}, W_{2,j}\} = \frac{1}{9}\sum_{j\in\{1,...,10\}}\left(W_{1,j} - \text{E}\{W_{1,\cdot}\}\right)\left(W_{2,j} - \text{E}\{W_{2,\cdot}\}\right) $$

To take the sum of the covariances, note that each of these types of covariance term arises $10\times 10\times 900$ times so, for your particular shape of sample sets,

$$ \text{Var}\{\bar{W}'\} = \frac{\text{Var}\{W_k\}}{100} + \frac{9}{100}\left(\text{Cov}\{W_{i,j},W_{i,m}\} + \text{Cov}\{W_{i,j},W_{m,j}\}\right) $$