Median vs. Mean

417 Views Asked by At

Problem:

Consider the following model:

$y_i = \mu + \epsilon_i$, $i = 1,...,n$

Let the mean $\mu$ be estimated by minimizing the criterion $\sum|\mu - y_i|$ over $\mu$.

Show that $m = $median($y_1,y_2,...,y_n$) is optimal for this criterion.

Distinguish the case $n$ is odd and $n$ is even.


My idea to approach the problem:

$1)$ Rewrite the criterion such that no absolute signs are needed (with indicator function for example).

$2)$ Determine the first order condition to recognize the median.

$3)$ Order data to distinguish $n$ odd or $n$ even


However, I dont't know how to to the steps of my apporach in a mathematical/statistical way.

Could anyone please help me?

2

There are 2 best solutions below

1
On BEST ANSWER

I think you are asking why $\sum|\mu-y_j|$ is minimized, as a function of $\mu$, by taking $\mu=m$, where $m$ is the median of the $y_j$. The answer is, as you shift $\mu$ away from the median, more terms in the sum increase than decrease.

See also https://scicomp.stackexchange.com/questions/816/optimizing-the-sum-of-the-absolute-values

1
On

Your approach looks good. $$ \sum |\mu - y_i| = \sum_{y_i< \mu} [\mu - y_i]+ \sum_{y_i> \mu} [y_i - \mu] $$ now suppose that $y_1< y_2 <\cdots < y_n$, and let $k(\mu)$ the unique $k$ such as $y_k < \mu < y_{k+1}$. First order conditions are enough to find the solution because the function is convex, and: $$ \sum |\mu - y_i| = \sum_{i=1}^{k(\mu)} [\mu - y_i]+ \sum_{i=k(\mu)+1}^n [y_i - \mu]\\ 0 = \frac{d}{d\mu} \sum |\mu - y_i| = k(\mu) - (n-k(\mu)-1) $$

  • if $n$ is odd, any $\mu\in [y_{{n-1}/2}, y_{{n+1}/2}]$ minimizes the function, in particular the median $\frac 12[y_{{n-1}/2}+ y_{{n+1}/2}]$
  • if $n$ is even, the derivative is never $=0$ but changes its sign when $\mu = y_{n/2}$. The median is the unique minimizer.