I'm trying to calculate the covariance for an example that I've created - using the covariance formula Cov(X,Y) = E(XY)-E(X)E(Y) as in this question - but I'm running into trouble.
In my example, I roll a 3-sided die 150 times and count how many times each side appears. In R I can simulate this for 1,000 rolls and show the first 3 results like so:
library(tidyverse)
set.seed(0)
m <- replicate(n = 1000, table(sample(c('X','Y','Z'), size=150, p=c(1/3,1/3,1/3), replace = TRUE))) %>% t()
m %>% head(3)
Which yields:
## X Y Z
## 48 42 60
## 42 59 49
## 54 45 51
I can compute the covariance like so:
cov(m)
Which yields:
## X Y Z
## X 31.89802 -16.08600 -15.81202
## Y -16.08600 31.47373 -15.38773
## Z -15.81202 -15.38773 31.19976
Now, I think:
- E[XY] = 2500
- E[X] = 50
- E[Y] = 50
... this gives me:
- Cov(X,Y) = E(XY)-E(X)E(Y) = 2500-(50*50) = 0
What am I doing wrong / how do I calculate the covariance correctly? It looks to be about -1/2 the variance...
A trick for computing $\text{Cov}(X,Y)$ is to note that $X+Y+Z=150$ always. \begin{align} 0 &= \text{Var}(X+Y+Z) \\ &= \text{Cov}(X+Y+Z, X+Y+Z) \\ &= \text{Var}(X) + \text{Var}(Y) + \text{Var}(Z) + 2\text{Cov}(X,Y) + 2 \text{Cov}(X, Z) + 2 \text{Cov}(Y, Z). \end{align} Since $X$, $Y$, and $Z$ are exchangeable, the last line equals $3\text{Var}(X) + 6 \text{Cov}(X,Y)$. Setting this equal to zero yields $\text{Cov}(X,Y) = -\frac{1}{2} \text{Var}(X)$ which matches what you observed in your simulation.
The error in your theoretical computation is $E[XY]=2500$, I don't know how you arrived at this number.