What would be the distribution of $(X_i-\frac{1}{n}\sum^n_{i=1}X_i)?$

324 Views Asked by At

Let $X_1,...,X_{n_1}$ be an i.i.d. sample from $N_p(\mu,\Sigma)$

What would be the distribution of $(X_i-\frac{1}{n}\sum^n_{i=1}X_i)?$

My attempt is:

We know that distribution of $\sum^n_{i=1}\frac{1}{n}X_i\sim N_p(\mu,\frac{1}{n}\Sigma)$

So, $\mathbb{E}[X_i-\frac{1}{n}\sum^n_{i=1}X_i]=0$

And variance is $\text{Var}(X_i-\frac{1}{n}\sum^n_{i=1}X_i)=\Sigma-\frac{1}{n}\Sigma=\frac{n-1}{n}\Sigma$

Would that be correct?

edit:

suppose the first $X_i$ in $X_i-\frac{1}{n}\sum^n_{i=1}X_i$ is $X_j$, $j\neq i$,so $X_i-\frac{1}{n}\sum^n_{i=1}X_i$ becomes $X_j-\frac{1}{n}\sum^n_{i=1,i\neq j}X_i-\frac{1}{n}X_j$

So,

$$\text{Var}(X_j-\frac{1}{n}\sum^n_{i=1,i\neq j}X_i-\frac{1}{n}X_j)=\text{Var}(-\frac{1}{n}\sum^n_{i=1,i\neq j}X_i+\frac{n-1}{n}X_j)=\frac{n-1}{n^2}\Sigma+\frac{(n-1)^2}{n^2}\Sigma=\frac{n-1}{n}\Sigma$$

4

There are 4 best solutions below

3
On BEST ANSWER

The question has basically been answered, but this is an attempt to prove the assumptions made in the techniques, namely that a constant times a multivariate normal random variable is itself multivariate normal and that a linear combination of multivariate normal random variables is multivariate normal. I'm not an expert on this, so it'd be interesting to know if this information is pretty interesting or completely useless.

A constant times a multivariate normal rv

Assume that $X$ is multivariate normal with mean vector $\mu$ and covariance matrix $\Sigma$. I might write this as $X\sim N_p(\mu, \Sigma)$. I would like to show that $Y=cX, c\in \mathbb R$ is mulvariate normally distributed, which isn't perhaps obvious. I use the method of taking the derivative of the cdf, and then comparing with the known density of the multivariate normal distribution. Consider the following, where $G(\textbf y)$ is the cdf of $Y$, $\Phi_X$ is the cdf of $X$, and $\phi_X$ is the pdf of $X$.

$\displaystyle G(\textbf y)=G(y_i,...,y_p)=P(Y_1\le y_1,...,Y_p\le y_p)=P(cX_1\le y_1,...,cX_p\le y_p)=P(X_1\le\frac {y_1} c,...,X_p\le\frac {y_p} c)=\Phi_X(\frac {y_1} c,...,\frac {y_p} c)=\int_{-\infty}^{y_p/c}...\int_{-\infty}^{y_1/c}\phi_X(a_1,...,a_p)da_1...da_p$.

Since the density function is the multivariable derivative of the cumulative distribution function, $\displaystyle g(\textbf y)=g(y_1,...,y_p)=\frac \partial {\partial y_1}...\frac \partial {\partial y_p}\int_{-\infty}^{y_p/c}...\int_{-\infty}^{y_1/c}\phi_X(a_1,...,a_p)da_1...da_p=\frac 1 {c^p} \phi_X(\frac {y_1} c,...,\frac {y_p} c)=\frac 1 {c^p} (2\pi)^{-k/2}\det(\Sigma)^{-1/2}e^{-\frac 1 2(\textbf y/c-\mu)^T\Sigma^{-1}(\textbf y/c-\mu)}=\frac 1 {\sqrt{c^{2p}\det \Sigma}}(2\pi)^{-k/2}e^{-\frac 1 2(\textbf y-c\mu)^T\frac 1 {c^2}\Sigma^{-1}(\textbf y-c\mu)}=\frac 1 {\sqrt{\det(c^2\Sigma)}}(2\pi)^{-k/2}e^{-\frac 1 2(\textbf y-c\mu)^T(c^2\Sigma)^{-1}(\textbf y-c\mu)}$, which is the density, according to Wikipedia article on multivariate normal distributions, of a multivariate normal random variable with mean vector $c\mu$ and covariance matrix $c^2\Sigma$. Thus $Y=cX\sim N_p(c\mu, c^2\Sigma)$.

Sum of two multivariate normal rv’s

Next I'd like to show that any linear combination of independent multivariate random variables is multivariate random. To do this, use the moment generating function again provided by Wikipedia and the fact that the sum of two independent random variables has a moment generating function that's the product of their respective mgf's.

$\psi_X(t)=\exp(\mu^Tt+\frac 1 2 t^T\Sigma t)$

Considering $X_1\sim N_p(\mu_1, \Sigma_1), X_2\sim N_p(\mu_2, \Sigma_2)$, it's relatively clear to see that $c_1X_1+c_2X_2$ has a mgf of $\displaystyle \exp(c_1\mu_1^Tt+\frac 1 2 t^Tc_1^2\Sigma t)\exp(c_2\mu_2^Tt+\frac 1 2 t^Tc_2^2\Sigma_2 t)=\exp((c_1\mu_1+c_2\mu_2)^Tt+\frac 1 2 t^T(c_1^2\Sigma_1+c_2^2\Sigma_2) t)$, the moment generating function of a multivariate random normal variable with mean vector $c_1\mu_1+c_2\mu_2$ and covariance matrix $c_1^2\Sigma_1+c_2^2\Sigma_2$. Thus when you add two multivariate normal rv's, you get a multivariate random variable with a mean of the sum of their means and a covariance of the sum of their covariances.


By applying these two principles, you correctly arrived at the answer (twice) above.

0
On

Think about a simple case, $X_1 - (X_1 + X_2 + X_3)/3$. What you have is $Y := \frac{2}{3}X_1 - \frac{1}{3}X_2 - \frac{1}{3}X_3$, a weighted sum of $N_p(\mu, \Sigma)$ i.i.d and you know $Y$ will be normally distributed with \begin{align} E\,Y &= (2/3)\mu - (1/3)\mu - (1/3)\mu \\ &= 0, \\ \text{cov}(Y, Y) &= (2/3)^2\Sigma + (1/3)^2\Sigma + (1/3)^2\Sigma \\ &= (2/3)\Sigma \end{align}

0
On

You have $$S=X_i - \frac{1}{n}\sum_{j=1}^nX_j = (\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T \mathbf{X}$$ where $\mathbf{e}_i\in \mathbb{R}^n$ the i-th standard basis vector, $\mathbf{1}_n \in \mathbb{R}^n$ the all ones vector and $\mathbf{X}$ the vector of $X_i$ which follows the multivariate normal distribution $N_n(\mu \mathbf{1}_n, \Sigma.\mathbb{I}_n)$ ($\mathbb{I}_n \in \mathbb{R}^{n\times n}$ the identity matrix)

The term $S$ is so equal in distribution to $$ \begin{align} S &=(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T \mathbf{X} \\ &=(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T N_n(\mu \mathbf{1}_n, \Sigma.\mathbb{I}_n) \\ &=N(\mu(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T \mathbf{1}_n,\Sigma(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)) \end{align} $$

In other words, $S$ follows a univariate normal distribution $N(0,(1-\frac{1}{n})\Sigma)$ of mean $$\mu(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T\mathbf{1}_n=\mu(1-\frac{1}{n} -(n-1)\frac{1}{n})=0$$ and variance $$\Sigma.(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n) =\Sigma.((1-\frac{1}{n})^2+(n-1)\frac{1}{n^2}) =(1-\frac{1}{n})\Sigma $$

0
On

And variance is $\operatorname{Var}(X_i-\frac{1}{n} \sum^n_{i=1} X_i)=\Sigma-\frac{1}{n}\Sigma=\frac{n-1}{n}\Sigma$

Would that be correct?

Be careful: The difference is between random variables that are correlated. You could say the variance is the sum of the two variances minus the covariances: \begin{align} & \operatorname{var}\left( X_i - \frac 1 n \sum_{j=1}^n X_j \right) \\[8pt] = {} & \operatorname{var}(X_i) + \frac 1 {n^2} \operatorname{var}\left( \sum_{j=1}^n X_j \right) \\[8pt] & \qquad {} - \operatorname{cov}\left( X_i,\,\, \frac 1 n \sum_{j=1}^n X_j \right) - \operatorname{cov}\left( \frac 1 n \sum_{j=1}^n X_j , \,\, X_i \right). \end{align} The covariance between random vectors $U\in\mathbb R^{k\times1},\,V\in\mathbb R^{\ell\times1}$ with respective expectations $\mu,\nu$ is $$ \operatorname{cov}(U,V) = \operatorname E\Big( (U-\mu)(V-\nu)^\top \Big) \in \mathbb R^{k\times\ell}. $$ Corollary: $$ \operatorname{cov}(V,U) = \Big( \operatorname{cov}(U,V)\Big)^\top. $$

I would argue as follows. We have $$ X_1,\ldots,X_n \sim \text{i.i.d.} \operatorname N_p(\mu,\Sigma) $$ We seek the distribution of $$ X_i-\frac 1 n \sum^n_{j=1} X_j. $$ (Notice that on the line above I distinguish between $i$ and $j.$) \begin{align} & X_i-\frac 1 n \sum^n_{j=1} X_j \\[8pt] = {} & {-\frac{X_1} n} - \frac{X_2} n - \cdots - \frac{X_{i-1}} n + \left( 1 - \frac 1 n \right) X_i \\[8pt] & {} \qquad {} - \frac{X_{i+1}} n - \cdots - \frac{X_n} n. \end{align} The terms in this sum are independent, so the variance is \begin{align} & \frac\Sigma{n^2} + \cdots + \frac\Sigma{n^2} + \left( 1 - \frac 1 n \right)^2 \Sigma + \frac\Sigma{n^2} + \cdots + \frac\Sigma{n^2} \\[10pt] = {} & \left( \frac{n-1}{n^2} + \frac{(n-1)^2}{n^2} \right) \Sigma = \frac{n-1} n \Sigma. \end{align} $$ \text{So it's } \operatorname N_p\left(0, \frac{n-1} n \Sigma\right). $$