Bound for Outlyingness

84 Views Asked by At

Given a sample of $n$ data, $x_1, \dots, x_n$. Define the sample mean $$\bar x := \frac{1}{n}(x_1+\cdots+x_n),$$ and sample variance $$s^2 := \frac{1}{n-1} \sum_{i=1}^n (x_i-\bar x)^2.$$ To measure how far away an individual data is from the bulk, define $$t_i := \frac{x_i - \bar x}{s}.$$ The question is how to show that for all $j$ $$\left|t_j\right| < \frac{n-1}{\sqrt{n}}.$$ It is sufficient to show that $$t_j^2 = \frac{(n-1)(x_j - \bar x)^2}{\sum_{i=1}^n (x_i-\bar x)^2} < \frac{(n-1)^2}{n}.$$ This reduced to $$n(x_j - \bar x)^2 < (n-1)\sum_{i=1}^n (x_i-\bar x)^2.$$ Then $$(x_j - \bar x)^2 < (n-1)\sum_{i\neq j}^n (x_i-\bar x)^2.$$ How to proceed from here, please? Thank you!

1

There are 1 best solutions below

6
On BEST ANSWER

You have $\displaystyle \frac1n \sum_{j=1}^n t_j = 0$, so $\displaystyle \sum_{j=1}^n t_j = 0$ and $\displaystyle \sum_{j\not =i} t_j = -t_i$, and thus $\displaystyle \sum_{j\not =i} t_j^2 \ge \frac{t_i^2}{n-1}$.

You also have $\displaystyle \frac1{n-1} \sum_{j=1}^n t_j^2 = 1$, so $\displaystyle \sum_{j=1}^n t_j^2 = n-1$ and $\displaystyle \sum_{j\not =i} t_j^2 = n-1-t_i^2$.

Putting these together gives $n-1-t_i^2 \ge \dfrac{t_i^2}{n-1}$ and so $|t_i| \le \dfrac{n-1}{\sqrt{n}}$.

If this had been a population with $\displaystyle s^2 := \frac{1}{n} \sum_{i=1}^n (x_i-\bar x)^2$ then you would have $|t_i| \le {\sqrt{n-1}}$.