Let $X_1,\ldots,X_n$ be a random sample from a normally distributed population. Is the sample mean average deviation $$\frac{\sum_{i=1}^n|X_i-\bar{X}|}{n}$$ an unbiased estimator of the population mean average deviation?
Is the sample mean absolute deviation unbiased for a normally distributed population?
104 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 2 best solutions below
On
Clearly not for $n=1$ when you always get $0$ and (less dramatically) not for larger $n$.
$\frac{\sum_{i=1}^n\left|X_i-\bar{X}\right|}{n}$ faces the same issue as $\frac{\sum_{i=1}^n(X_i-\bar{X})^2}{n}$ in that $\bar X$ tends to be closer to the $X_i$ than $\mu$ is.
For a normal distribution (but not others) $\mathbb E\left[\frac{\sum_{i=1}^n|X_i-\mu|}{n}\right] =\sqrt{\frac{2}{\pi}} \sigma$, while it seems empirically $\mathbb E\left[\frac{\sum_{i=1}^n\left|X_i-\bar{X}\right|}{n}\right] $ $= \sqrt{\frac{n-1}{n}} \sqrt{\frac{2}{\pi}} \sigma$ or close to that.
As an illustration, with a standard normal and sample size $n=4$, the expected absolute distance to the sample average seems to be closer to $\sqrt{\frac3{2\pi}} \approx 0.691$ than the expected absolute distance to the mean of $\sqrt{\frac2{\pi}} \approx 0.798$:
avabsdevnorm <- function(n, mu=0, sigma=1){
X <- rnorm(n, mu, sigma)
meanX <- mean(X)
return(c(mean(abs(X-meanX)), mean(abs(X-mu))))
}
set.seed(2023)
n <- 4
cases <- 10^5
sims <- replicate(cases, avabsdevnorm(n))
c(mean(sims[1,]), mean(sims[2,]))
# 0.6917616 0.7990887
For comparison, for a uniform distribution on $[a,b]$, you have $\mathbb E\left[\frac{\sum_{i=1}^n|X_i-\mu|}{n}\right] =\frac{b-a}{4}$, while it seems empirically $\mathbb E\left[\frac{\sum_{i=1}^n\left|X_i-\bar{X}\right|}{n}\right] $ $= \left(1-\frac{2}{3n}\right)\frac{b-a}{4}$ or close to that at least with $n\ge 2$. Another simulation of $U(0,1)$, again with $n=4$, shows the expected absolute distance to the sample average seems to be closer to $\frac5{24} \approx 0.208$ than the expected absolute distance to the mean of $\frac14=0.25$:
avabsdevunif <- function(n, low=0, high=1){
X <- runif(n, low, high)
meanX <- mean(X)
return(c(mean(abs(X-meanX)), mean(abs(X-(high-low)/2))))
}
set.seed(2023)
n <- 4
cases <- 10^5
simunif <- replicate(cases, avabsdevunif(n))
c(mean(simunif[1,]), mean(simunif[2,]))
# 0.2081946 0.2498248
This answer is merely a follow up on Henry's answer; it also proves Henry's empirical observation.
Substituting $x^2/\sigma^2=u$, $$\int_0^\infty\frac{2x}{\sigma^2}e^{-x^2/2\sigma^2}\,dx=\int_0^\infty e^{-u/2}\,du=[-2e^{-u}]_0^\infty=2.$$ Hence, $$\int_0^\infty xe^{-x^2/2\sigma^2}\,dx=\sigma^2,$$ and therefore $$\int_{-\infty}^\infty|x|e^{-x^2/2\sigma^2}\,dx=2\sigma^2.$$ Then $$\int_{-\infty}^\infty |x|\frac{1}{\sqrt{2\pi}\sigma}e^{-x^2/2\sigma^2}={\sqrt{\frac{2}{\pi}}}\cdot\sigma.$$ Translating by $\mu$, $$\int_{-\infty}^\infty|x-\mu|\frac{1}{\sqrt{2\pi}\sigma}e^{-(x-\mu)^2/2\sigma^2}\,dx=\sqrt{\frac{2}{\pi}}\cdot\sigma.$$ That is, if $X\sim N(\mu,\sigma^2)$, then the mean absolute deviation (MAD) of $X$ is $$E\left[|X-\mu|\right]=\sqrt{\frac{2}{\pi}}\cdot\sigma.$$ Now let $X_1,\ldots,X_n\sim N(\mu,\sigma^2)$ be a random sample. Let us compute the mean of the statistic $$\frac{\sum_{i=1}^n|X_i-\bar{X}|}{n}.$$ Note that \begin{align*} X_1-\bar{X}&=X_1-\frac{X_1+\cdots+X_n}{n}\\ &=\frac{(n-1)X_1-X_2-\cdots-X_n}{n}\\ &\sim N\left( 0, \frac{(n-1)^2+(n-1)}{n^2}\sigma^2 \right)=N\left( 0,\frac{n-1}{n}\sigma^2 \right), \end{align*} so $$E\left[|X_1-\bar{X}|\right]=\sqrt{\frac{2}{\pi}}\cdot\sqrt{\frac{n-1}{n}}\cdot\sigma.$$ Similarly, $$E\left[|X_i-\bar{X}|\right]=\sqrt{\frac{2}{\pi}}\cdot\sqrt{\frac{n-1}{n}}\cdot\sigma$$ for all $i=1,\ldots,n$, so $$E\left[\frac{\sum_{i=1}^n|X_i-\bar{X}|}{n}\right]=\frac{\sum_{i=1}^n E\left[|X_i-\bar{X}|\right]}{n}=\sqrt{\frac{2}{\pi}}\cdot\sqrt{\frac{n-1}{n}}\cdot\sigma.$$