Covariance between $X_i-\overline{X}$ and $\overline{X}$

294 Views Asked by At

Let $n>2$ and $\sigma^2>0$.

A math exam was held with $n$ participants. The score follows the normal distribution with the mean $\mu_X$ the variance $\sigma^2$.

Scores of the math exam are $X_1,...,X_n$.

$$\overline{ X }=\frac{1}{n}\displaystyle \sum_{i=1}^n X_i$$

For each $i = 1,...,n$, what is the value of covariance between $X_i-\overline{X}$ and $\overline{X}$?


(What I have tried)

$\operatorname{Cov}[X_i-X,\overline{X}]$

$ = E[(X_i-X)\overline{X}]-E[X_i-X]E[\overline{X}]$

$=E[X_i\overline{X}] - E[\overline{X}^2] - (E[X_i]-E[\overline{X}])E[\overline{X}]$

$=E[X_i\overline{X}] - E[\overline{X}^2] - (E[X_i]-\mu)\mu$

and I don't know how to deal with the rest of term with $E[]$.

Can anyone help me?

1

There are 1 best solutions below

2
On

Residuals about a mean have $0$ covariance with the mean. Without loss of generality, find $Cov(X_1-\bar X, \bar X):$ Then $$Cov(X_1 - \bar X, \bar X) = Cov(X_1, \bar X) - Cov(\bar X,\bar X)\\ = Cov(X_1, \bar X) + Var(\bar X) = Cov(X_1,\bar X) -\sigma^2/n.$$

Now $$Cov(X_1,\bar X) = Cov\left(X_1, \frac 1n\sum_{i=1}^nX_i\right)\\ =Cov\left(X_1,\frac 1n X_1\right) + 0 = \frac 1n Cov(X_1,X_1)\\ = \frac 1n Var(X_1) = \sigma^2/n.$$

Thus, $Cov(X_1,\bar X) = \sigma^2/n - \sigma^2/n = 0.$

Relavance to statistical inference. This result is important in statistical inference. The residuals $r_i = X_i - \bar X$ of observations from their group means are widely used in ANOVA and regression.

Sample mean and variance independent for normal data. For normal data uncorrelated implies independent. Because $\bar X$ is independent of the $r_i,$ then it is independent of $S.$ So for normal data $\bar X$ and $S_X^2$ are stochastically independent. (They are not 'functionally' independent because $\bar X$ is used to find $S_X^2.)$ This is important for t statistics because Student's t distribution is defined in terms of a ratio with numerator and denominator independent.

Simulations illustrating lack of correlation. A brief simulation in R illustrates that means are not correlated with residuals from them. (The simulation uses 10 million normal samples of size $n=10,$ giving several decimal places of accuracy for the correlation.)

set.seed(2020)
M = 10^7; n = 10
X = rnorm(M*n, 100, 15)
DTA = matrix(X, nrow=M)
A = rowMeans(DTA)
X1 = DTA[,1]
cor(X1-A,A)
[1] -0.0004722208  # aprx 0

A similar simulation with exponential data also shows lack of correlation:

set.seed(2020)
M = 10^7; n = 10
Y = rexp(M*n)
DTA = matrix(Y, nrow=M)
A = rowMeans(DTA)
Y1 = DTA[,1]
cor(Y1-A,A)
[1] 4.620507e-08

However, scatterplots of residuals against means illustrates independence for the normal data, but a clear pattern of dependence for the exponential data. (We use reduced numbers of datasets for a manageable number of points in the scatterplots.)

enter image description here

m=30000
x1=X1[1:m]; a.x=A[1:m]; r.x=x1-a.x
y1=Y1[1:m]; a.y=A[1:m]; r.y=y1-a.y
par(mfrow=c(1,2))
 plot(a.x,r.x, pch=".", main="Normal Data")
 plot(a.y,r.y, pch=".", main="Exponential Data")
par(mfrow=c(1,1))