Expected value of the sample covariance

1.1k Views Asked by At

Let $X = (X_1,\dots,X_p)$ is a random (column) vector with values in $\mathbb R^p$. The covariance matrix $\mathrm{Cov}(X,X)$ is defined by

$$\mathrm{Cov}(X) := E[(X-E[X])(X-E[X])^T]$$

By definition the $(i,j)$-component of the covariance matrix is the covariance $\mathrm{Cov}(X_i,X_j)$ of two random variables.

If we have $n$ samples $x_1,\dots,x_n \in \mathbb R^p$, then it is known that the covariance matrix is estimated by the following matrix.

$$Q = \frac{1}{n-1}\sum_k (x_k-\bar x)(x_k-\bar x)^T$$

Here $\bar x = n^{-1}\sum_k x_k$ is the vector of sample means of random variables $X_1,\dots,X_p$.

I would like to know how to see that $Q$ estimates the covariance matrix.

The $(i,j)$-component of $Q$ is given by

$$\frac{1}{n-1}\sum_k (x_{ki}-\frac{1}{n}\sum_l x_{li})(x_{kj}-\frac{1}{n}\sum_l x_{lj})$$

Here $x_k = (x_{k1},\dots,x_{kp})^T$.

To make notation easier, we put $Y=X_i$ and $Z=X_j$. Then I want to prove the following formula

$$\mathrm{Cov}(Y,Z) = E\left[\frac{1}{n-1}\sum_{k=1}^n(Y_k-\bar Y)(Z_k - \bar Z)\right]$$

Here $Y_1,\dots,Y_n,Y$ are iid and $Z_1,\dots,Z_n,Z$ are iid.

Actually I am very confused by my formulation and I am missing where to start. I understand the case where $Y=Z$, i.e. the sample variance. The biggest problem would be that the relation between $\mathrm{Cov}(Y,Z)$ and $\sigma_{ij} := \mathrm{Cov}(Y_i,Z_j)$ is unclear.