Correlation of Proportions

177 Views Asked by At

To introduce my question, here is a small simplification for consideration:
Let $X,Y$ be independent random variates, each with finite mean and variance. Interestingly, $$\text{Corr}\big(\frac{X}{X+Y},\frac{Y}{X+Y} \big) = \text{Corr}\big(\frac{X}{X+Y},1-\frac{X}{X+Y} \big) = -1$$

The question becomes, if I have a set of $X_i$ which are independent with finite mean and variance, is there a possible simplification (much like the relationship above) which can be made to the following for some selection of $j,k$? $$\text{Corr}\big(\frac{X_j}{\sum_i{X_i}},\frac{X_k}{\sum_i{X_i}} \big)$$ If the variates are 'iid' this problem is likely simplified considerably. And while I would like to see something of that form, the case I'm most interested in is when $X_i$ are all from the same 'family' of distribution, but with different parameters, $\mu_i$ as a varying mean, for example.

I attempted looking at the fact that $$\frac{X_j}{\sum_i{X_i}} = 1 - \frac{\sum_{i\neq j}X_i}{\sum_i{X_i}} = 1 - \frac{X_k}{\sum_i X_i} - \frac{\sum_{i\notin\{j,k\}}X_i}{\sum_i X_i}, \ k\neq j$$ But using this just gives us $$\text{Corr}\big(\frac{X_j}{\sum_i{X_i}},\frac{X_k}{\sum_i{X_i}} \big) = $$ $$\text{Corr}\big(1 - \frac{X_k}{\sum_i X_i} - \frac{\sum_{i\notin\{j,k\}}X_i}{\sum_i X_i}, 1 - \frac{X_j}{\sum_i X_i} - \frac{\sum_{i\notin\{j,k\}}X_i}{\sum_i X_i}\big)$$ Which if I understand correctly is very similar to: $$\text{Corr}\big(1 - \frac{X_k}{\sum_i X_i},1 - \frac{X_j}{\sum_i X_i}\big)$$ So it didn't really take me anywhere

2

There are 2 best solutions below

2
On BEST ANSWER

Given iid random variables $X_i, i = 1, \ldots, n$. Define $Z_j = \frac{X_j}{\sum_{i=1}^n X_i}.$ Clearly, $$\sum_{j=1}^n Z_j = 1.$$ Take variance on both sides, we get $$ \begin{split} Cov(\sum_{j=1}^n Z_j, \sum_{j=1}^n Z_j) &= 0. \end{split} $$ We know $$ \begin{split} Cov(\sum_{j=1}^n Z_j, \sum_{j=1}^n Z_j) &= \sum_{j=1}^n Var(Z_j) + 2\sum_{j < k} Cov(Z_j, Z_k) \\ &= n Var(Z_1) + n(n-1) Cov(Z_1, Z_2). \\ \end{split} $$ We get to second step using symmetry. In particular, $Var(Z_j) = Var(Z_k)$ and $Cov(Z_j, Z_k) = Cov(Z_1, Z_2)$. Using the last equation, we get $\frac{Cov(Z_1, Z_2)}{Var(Z_1)} = -\frac{1}{n-1}$. Since, $$ \frac{Cov(Z_1, Z_2)}{Var(Z_1)} = \frac{Cov(Z_1, Z_2)}{\sqrt{Var(Z_1)Var(Z_2)}} = \rho_{Z_1, Z_2}, $$ we get correlation is $-1/(n-1)$.

1
On

Well a related formula for random variables that sum to a constant $c$ is the following: Suppose $Y_1, \ldots, Y_N$ are random variables that satisfy $\sum_{i=1}^N Y_i = c$. Then: \begin{align} c &= \sum_{i=1}^NE[Y_i] \\ c^2 &= \sum_{i, j} E[Y_i]E[Y_j] \: \: (*) \end{align} Also: \begin{align} c^2 &= \sum_{i, j} Y_i Y_j\\ c^2 &= \sum_{i, j} E[Y_iY_j] \: \: (**) \end{align} Subtracting equation (*) from (**) gives: $$ 0 = \sum_{i, j} (E[Y_iY_j]-E[Y_i]E[Y_j]) = \sum_{i=1}^NVar(Y_i) + \sum_{i\neq j} Cov(Y_i,Y_j) $$ In particular: $$ \sum_{i \neq j} Cov(Y_i,Y_j) = -\sum_{i=1}^NVar(Y_i) $$

Assuming $Var(Y_i)>0$ for at least one $i \in \{1, \ldots, N\}$, we get: $$ \boxed{\frac{\sum_{i\neq j} Cov(Y_i,Y_j)}{\sum_{i=1}^NVar(Y_i)} = -1} $$