From a population consisting of the numbers: $\lbrace 1,2 \ldots 10 \rbrace$, two samples are chosen from it without replacement. If the random variable denoting the first choice is X and the second choice is $Y$, what is the correlation coefficient ($\rho$) between $X$ and $Y$
Calculate correlation coefficient for discrete random variable
1k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
We want to find $$ \rho(X,Y) := \frac{Cov(X,Y)}{\sqrt{V[X]V[Y]}} = \frac{E[XY] -E[X]E[Y]}{\sqrt{V[X]V[Y]}}. $$
Inference on X
Assume X is uniform on {1, 2, ..., N}, then $$ Pr(X=x) = 1/N \\ E[X] = \sum_{i=1}^N Pr(X=i) \cdot i = \frac{N+1}{2}, \\ E[X^2] = \sum_{i=1}^N Pr(X=i) \cdot i^2 = \frac{(N+1)(2N+1)}{6},\\ V[X] = E[X^2] - E[X]^2 = \frac{N^2-1}{12}. $$
Inference on Y
The conditional density (pmf) for Y given X is given by the fact that the two samples are taken without replacement $$ Pr(Y=y|X=x) = \frac{1}{N-1} 1_{x\neq y}, \\ Pr(Y=y) = \sum_{i=1}^N Pr(Y=y|X=i)Pr(X=i) = \frac{1}{N} = Pr(X=x). $$ Hence X and Y has the same marginal density, and thus the same mean and variance. $$ E[Y]=E[X],\\ V[Y]=V[X]. $$
Joint inference X, Y
The joint density is $$ Pr(Y=y,X=x) = Pr(Y=y|X=x)Pr(X=x) = \frac{1}{N(N-1)} 1_{x\neq y}. $$
The product expectation $$ E[XY] = \sum_{i=1}^N\sum_{j=1}^N ij \cdot Pr(Y=i, X=j) = \frac{1}{N(N-1)} \sum_{i=1}^N\sum_{j=1}^N ij \cdot 1_{i\neq j} = \frac{1}{N(N-1)} \left( \sum_{i=1}^N\sum_{j=1}^N ij - \sum_{k=1}^N k^2\right) = \frac{(N+1)(3N+2)}{12}. $$
Correlation
Finally we have,
$$ \rho(X,Y) := \frac{E[XY] -E[X]E[Y]}{\sqrt{V[X]V[Y]}} = \frac{E[XY] -E[X]^2}{V[X]} = -\frac{1}{N-1}. $$
With N=2 as in your problem: $\rho(X,Y) = -\frac{1}{9} = -0.111..$
Short intuition
Intuitively, we expect a negative correlation. Consider the outcome space {1,2,3}, which has mean 2. If X is below the mean, e.g. X=1 (<2), then Y must be 2 or 3, which has a mean of 2.5 (>2), i.e. biased upwards. If X is above the mean, e.g. X=3, then Y must be 1 or 2, which has a mean of 1.5, i.e. biased downwards. So Y will be biased away from X, i.e. negative correlation.
In fact, the above calculations can be genralized further by considering $m \leq N$ draws (random variables) instead of two. Then the correlation between any two different draws is still $-\frac{1}{N-1}$, or formally $\rho(X_i,X_j) = -\frac{1}{N-1}$, for $i\neq j$ and $i,j \leq m\leq N$.
Assuming $X$ is uniform on $\{1, 2, \dots, 10\}$ we have $$ \mathbb E X = \frac{1 + 2 + \dots + 10}{10} = 5.5, \quad \mathbb{V}ar X = \frac{1^2 + \dots + 10^2}{10} - \mathbb E X^2 = 8.25. $$ To compute the expected value of $Y$ write $$ \mathbb E Y = \sum_{x \in [10]} \mathbb E[Y ~|~ X=x] \mathbb{P}(X = x), $$ where $[n] := \{1, 2, \dots, n \}.$ Let us first compute $\mathbb E Y:$ $$ \mathbb E Y = \frac 1{10}\left[\frac {1 + \dots + 9}{9} + \dots + \frac{2 + \dots + 10}{9}\right] = \frac{55}{10} = 5.5. $$ $$ \mathbb Var Y = \frac 1{10}\left[\frac {1^2 + \dots + 9^2}{9} + \dots + \frac{2^2 + \dots + 10^2}{9}\right] - \mathbb E Y^2 = 8.25 $$ For $\mathbb E XY$ we have $$ \mathbb E XY = \frac{1}{10}\left[\frac{1 + 2 + \dots + 10}{9\cdot 10} \cdot \sum_{x, y \in [10] \setminus x} y \right] = 30.25. $$ Hence, $$ \rho(X, Y) = \frac{\mathbb Cov(X, Y)}{\sqrt{\mathbb Var X \mathbb Var Y}} = \frac{30.25 - 5.5^2}{8.25} = 0. $$
As for me, it is counterintuitive that the answer is $0$ and I suspect that there must be much simpler solution.