Variance of $X_{i}+X_{j}$ if $X_{1} + \ldots + X_{N}=1$

81 Views Asked by At

I have random variables $X_{1}, X_{2}, \ldots, X_{N}$, where $X_{i} \in \{0,1\}$ and $$X_{1} + \ldots + X_{N}=1$$ I.e. exactly one of the $X_{i}$'s are $1$, and the rest are $0$.

In addition, denote $P(X_{i} = 1) = p_{i}$, where $\sum_{i} p_{i}=1$. I do not want to assume that the $X_{i}$'s are identically distributed, meaning we may have $p_{i} \neq p_{j}$ for some $i\neq j$.

I now want to calculate the variance of $X_{i}+X_{j}$ for any $i \neq j$, but am having some difficulty. Can anyone provide a formula and an explanation of how to do so for this example? Much appreciated!

2

There are 2 best solutions below

0
On BEST ANSWER
  • $X_i + X_j$ takes values in $\{0,1\}$, so is a Bernoulli random variable.
    • This is because at most one of the $X_k$ is equal to $1$.
  • $\mathbb{P}(X_i + X_j = 1) = \mathbb{P}(X_i =1 \text{ or } X_j = 1) = \mathbb{P}(X_i = 1) + \mathbb{P}(X_j = 1) = p_i + p_j$
    • The second equality holds because the events $\{ X_i = 1\}, \{ X_j = 1\}$ are disjoint.
  • Thus, $X_i + X_j \sim \text{Ber}(p_i + p_j)$
  • If $Y \sim \text{Ber}(q)$, then $\text{Var}(Y)=q(1-q)$
  • Thus, the variance of $(X_i + X_j)$ is $(p_i + p_j) \cdot (1 - p_i - p_j)$
0
On

Let $S_N := \sum\limits_{k=1}^N X_k$

If the $N$ variables are independent and identically distributed (are they?) then when given that exactly one of them has the value of $1$, there is equal probability that it can be any one of them.

$$\mathsf P(X_i=1\mid S_N=1) = \tfrac 1N$$

Now similarly evaluate the following conditional joint probabilities, for any $i,j\in\{1,..,n\}$ such that $i\neq j$

$$p_{0,0}~=~\mathsf P(X_i=0, X_j=0\mid S_N=1)\\ p_{0,1}~=~\mathsf P(X_i=0, X_j=1\mid S_N=1)\\ p_{1,0}~=~\mathsf P(X_i=1, X_j=0\mid S_N=1)\\ p_{1,1}~=~\mathsf P(X_i=1, X_j=1\mid S_N=1)$$

Then use the definition of variance:

$$\mathsf {Var}(X_i+X_j\mid S_N=1)~=~\sum_{x=0}^1\sum_{y=0}^1 (x+y)^2p_{x,y} -\left(\sum_{x=0}^1\sum_{y=0}^1 (x+y)p_{x,y}\right)^2$$


Or else similarly find $\mathsf {Var}(X_i\mid S_N=1)$ and $\mathsf {Cov}(X_i,X_j\mid S_N=1)$ and then use:

$$\mathsf {Var}(X_i+X_j\mid S_N=1)~=~\mathsf {Var}(X_i\mid S_N=1)+\mathsf{Var}(X_j\mid S_N=1)+2\mathsf{Cov}(X_i,X_j\mid S_N=1)$$