What would be the distribution of $(X_i-\frac{1}{n}\sum^n_{i=1}X_i)?$

Question

What would be the distribution of $(X_i-\frac{1}{n}\sum^n_{i=1}X_i)?$

324 Views Asked by user634512 At 30 Mar 2026 - 7:56

Let $X_1,...,X_{n_1}$ be an i.i.d. sample from $N_p(\mu,\Sigma)$

My attempt is:

We know that distribution of $\sum^n_{i=1}\frac{1}{n}X_i\sim N_p(\mu,\frac{1}{n}\Sigma)$

So, $\mathbb{E}[X_i-\frac{1}{n}\sum^n_{i=1}X_i]=0$

And variance is $\text{Var}(X_i-\frac{1}{n}\sum^n_{i=1}X_i)=\Sigma-\frac{1}{n}\Sigma=\frac{n-1}{n}\Sigma$

Would that be correct?

edit:

suppose the first $X_i$ in $X_i-\frac{1}{n}\sum^n_{i=1}X_i$ is $X_j$, $j\neq i$,so $X_i-\frac{1}{n}\sum^n_{i=1}X_i$ becomes $X_j-\frac{1}{n}\sum^n_{i=1,i\neq j}X_i-\frac{1}{n}X_j$

So,

$$\text{Var}(X_j-\frac{1}{n}\sum^n_{i=1,i\neq j}X_i-\frac{1}{n}X_j)=\text{Var}(-\frac{1}{n}\sum^n_{i=1,i\neq j}X_i+\frac{n-1}{n}X_j)=\frac{n-1}{n^2}\Sigma+\frac{(n-1)^2}{n^2}\Sigma=\frac{n-1}{n}\Sigma$$

Original Q&A

There are 4 best solutions below

Bumbble Comm On 25 Jan 2021 - 1:47

Think about a simple case, $X_1 - (X_1 + X_2 + X_3)/3$. What you have is $Y := \frac{2}{3}X_1 - \frac{1}{3}X_2 - \frac{1}{3}X_3$, a weighted sum of $N_p(\mu, \Sigma)$ i.i.d and you know $Y$ will be normally distributed with \begin{align} E\,Y &= (2/3)\mu - (1/3)\mu - (1/3)\mu \\ &= 0, \\ \text{cov}(Y, Y) &= (2/3)^2\Sigma + (1/3)^2\Sigma + (1/3)^2\Sigma \\ &= (2/3)\Sigma \end{align}

Bumbble Comm On 25 Jan 2021 - 3:08

You have $$S=X_i - \frac{1}{n}\sum_{j=1}^nX_j = (\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T \mathbf{X}$$ where $\mathbf{e}_i\in \mathbb{R}^n$ the i-th standard basis vector, $\mathbf{1}_n \in \mathbb{R}^n$ the all ones vector and $\mathbf{X}$ the vector of $X_i$ which follows the multivariate normal distribution $N_n(\mu \mathbf{1}_n, \Sigma.\mathbb{I}_n)$ ($\mathbb{I}_n \in \mathbb{R}^{n\times n}$ the identity matrix)

The term $S$ is so equal in distribution to $$ \begin{align} S &=(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T \mathbf{X} \\ &=(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T N_n(\mu \mathbf{1}_n, \Sigma.\mathbb{I}_n) \\ &=N(\mu(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T \mathbf{1}_n,\Sigma(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)) \end{align} $$

In other words, $S$ follows a univariate normal distribution $N(0,(1-\frac{1}{n})\Sigma)$ of mean $$\mu(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T\mathbf{1}_n=\mu(1-\frac{1}{n} -(n-1)\frac{1}{n})=0$$ and variance $$\Sigma.(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n)^T(\mathbf{e}_i - \frac{1}{n}\mathbf{1}_n) =\Sigma.((1-\frac{1}{n})^2+(n-1)\frac{1}{n^2}) =(1-\frac{1}{n})\Sigma $$

Bumbble Comm On 25 Jan 2021 - 3:38

And variance is $\operatorname{Var}(X_i-\frac{1}{n} \sum^n_{i=1} X_i)=\Sigma-\frac{1}{n}\Sigma=\frac{n-1}{n}\Sigma$

Would that be correct?

Be careful: The difference is between random variables that are correlated. You could say the variance is the sum of the two variances minus the covariances: \begin{align} & \operatorname{var}\left( X_i - \frac 1 n \sum_{j=1}^n X_j \right) \\[8pt] = {} & \operatorname{var}(X_i) + \frac 1 {n^2} \operatorname{var}\left( \sum_{j=1}^n X_j \right) \\[8pt] & \qquad {} - \operatorname{cov}\left( X_i,\,\, \frac 1 n \sum_{j=1}^n X_j \right) - \operatorname{cov}\left( \frac 1 n \sum_{j=1}^n X_j , \,\, X_i \right). \end{align} The covariance between random vectors $U\in\mathbb R^{k\times1},\,V\in\mathbb R^{\ell\times1}$ with respective expectations $\mu,\nu$ is $$ \operatorname{cov}(U,V) = \operatorname E\Big( (U-\mu)(V-\nu)^\top \Big) \in \mathbb R^{k\times\ell}. $$ Corollary: $$ \operatorname{cov}(V,U) = \Big( \operatorname{cov}(U,V)\Big)^\top. $$

I would argue as follows. We have $$ X_1,\ldots,X_n \sim \text{i.i.d.} \operatorname N_p(\mu,\Sigma) $$ We seek the distribution of $$ X_i-\frac 1 n \sum^n_{j=1} X_j. $$ (Notice that on the line above I distinguish between $i$ and $j.$) \begin{align} & X_i-\frac 1 n \sum^n_{j=1} X_j \\[8pt] = {} & {-\frac{X_1} n} - \frac{X_2} n - \cdots - \frac{X_{i-1}} n + \left( 1 - \frac 1 n \right) X_i \\[8pt] & {} \qquad {} - \frac{X_{i+1}} n - \cdots - \frac{X_n} n. \end{align} The terms in this sum are independent, so the variance is \begin{align} & \frac\Sigma{n^2} + \cdots + \frac\Sigma{n^2} + \left( 1 - \frac 1 n \right)^2 \Sigma + \frac\Sigma{n^2} + \cdots + \frac\Sigma{n^2} \\[10pt] = {} & \left( \frac{n-1}{n^2} + \frac{(n-1)^2}{n^2} \right) \Sigma = \frac{n-1} n \Sigma. \end{align} $$ \text{So it's } \operatorname N_p\left(0, \frac{n-1} n \Sigma\right). $$

**Bumbble Comm** · Accepted Answer

The question has basically been answered, but this is an attempt to prove the assumptions made in the techniques, namely that a constant times a multivariate normal random variable is itself multivariate normal and that a linear combination of multivariate normal random variables is multivariate normal. I'm not an expert on this, so it'd be interesting to know if this information is pretty interesting or completely useless.

A constant times a multivariate normal rv

Assume that $X$ is multivariate normal with mean vector $\mu$ and covariance matrix $\Sigma$. I might write this as $X\sim N_p(\mu, \Sigma)$. I would like to show that $Y=cX, c\in \mathbb R$ is mulvariate normally distributed, which isn't perhaps obvious. I use the method of taking the derivative of the cdf, and then comparing with the known density of the multivariate normal distribution. Consider the following, where $G(\textbf y)$ is the cdf of $Y$, $\Phi_X$ is the cdf of $X$, and $\phi_X$ is the pdf of $X$.

$\displaystyle G(\textbf y)=G(y_i,...,y_p)=P(Y_1\le y_1,...,Y_p\le y_p)=P(cX_1\le y_1,...,cX_p\le y_p)=P(X_1\le\frac {y_1} c,...,X_p\le\frac {y_p} c)=\Phi_X(\frac {y_1} c,...,\frac {y_p} c)=\int_{-\infty}^{y_p/c}...\int_{-\infty}^{y_1/c}\phi_X(a_1,...,a_p)da_1...da_p$.

Since the density function is the multivariable derivative of the cumulative distribution function, $\displaystyle g(\textbf y)=g(y_1,...,y_p)=\frac \partial {\partial y_1}...\frac \partial {\partial y_p}\int_{-\infty}^{y_p/c}...\int_{-\infty}^{y_1/c}\phi_X(a_1,...,a_p)da_1...da_p=\frac 1 {c^p} \phi_X(\frac {y_1} c,...,\frac {y_p} c)=\frac 1 {c^p} (2\pi)^{-k/2}\det(\Sigma)^{-1/2}e^{-\frac 1 2(\textbf y/c-\mu)^T\Sigma^{-1}(\textbf y/c-\mu)}=\frac 1 {\sqrt{c^{2p}\det \Sigma}}(2\pi)^{-k/2}e^{-\frac 1 2(\textbf y-c\mu)^T\frac 1 {c^2}\Sigma^{-1}(\textbf y-c\mu)}=\frac 1 {\sqrt{\det(c^2\Sigma)}}(2\pi)^{-k/2}e^{-\frac 1 2(\textbf y-c\mu)^T(c^2\Sigma)^{-1}(\textbf y-c\mu)}$, which is the density, according to Wikipedia article on multivariate normal distributions, of a multivariate normal random variable with mean vector $c\mu$ and covariance matrix $c^2\Sigma$. Thus $Y=cX\sim N_p(c\mu, c^2\Sigma)$.

Sum of two multivariate normal rv’s

Next I'd like to show that any linear combination of independent multivariate random variables is multivariate random. To do this, use the moment generating function again provided by Wikipedia and the fact that the sum of two independent random variables has a moment generating function that's the product of their respective mgf's.

$\psi_X(t)=\exp(\mu^Tt+\frac 1 2 t^T\Sigma t)$

Considering $X_1\sim N_p(\mu_1, \Sigma_1), X_2\sim N_p(\mu_2, \Sigma_2)$, it's relatively clear to see that $c_1X_1+c_2X_2$ has a mgf of $\displaystyle \exp(c_1\mu_1^Tt+\frac 1 2 t^Tc_1^2\Sigma t)\exp(c_2\mu_2^Tt+\frac 1 2 t^Tc_2^2\Sigma_2 t)=\exp((c_1\mu_1+c_2\mu_2)^Tt+\frac 1 2 t^T(c_1^2\Sigma_1+c_2^2\Sigma_2) t)$, the moment generating function of a multivariate random normal variable with mean vector $c_1\mu_1+c_2\mu_2$ and covariance matrix $c_1^2\Sigma_1+c_2^2\Sigma_2$. Thus when you add two multivariate normal rv's, you get a multivariate random variable with a mean of the sum of their means and a covariance of the sum of their covariances.

By applying these two principles, you correctly arrived at the answer (twice) above.

What would be the distribution of $(X_i-\frac{1}{n}\sum^n_{i=1}X_i)?$

There are 4 best solutions below

A constant times a multivariate normal rv

Sum of two multivariate normal rv’s

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in NORMAL-DISTRIBUTION

Related Questions in EXPECTED-VALUE

Related Questions in COVARIANCE

Trending Questions

Popular # Hahtags

Popular Questions