Trouble with proof of deviations square

58 Views Asked by At

I apologize upfront for any spelling mistakes, I'm not used to writing math in english! I tried searching for this question in here already but was not sure I used the best tags while doing so. Anyway, to the question:

It was taken from a brazilian textbook on Basic Statistics (Bussab & Morettin, 2013). It basically justs asks me to show that:

$${\sum\limits_{i = 1}^n {\left( {{x_i} - \overline x } \right)} ^2} = \sum\limits_{i = 1}^n {{x_i}^2 - n{{\overline x }^2} = \sum\limits_{i = 1}^n {{x_i}^2 - {{{{\left( {\Sigma {x_i}} \right)}^2}} \over n}} } $$

Now, I didn't really know where to start or if there's an official recommended approach to such proofs, but I just tried to start it by opening the first term:

$$\eqalign{ & {\sum\limits_{i = 1}^n {\left( {{x_i} - \overline x } \right)} ^2} = {\left( {{x_1} - \overline x } \right)^2} + {\left( {{x_2} - \overline x } \right)^2} + ... + {\left( {{x_n} - \overline x } \right)^2} \cr & {\sum\limits_{i = 1}^n {\left( {{x_i} - \overline x } \right)} ^2} = \left( {{x_1}^2 - 2{x_1}\overline x + {{\overline x }^2}} \right) + \left( {{x_2}^2 - 2{x_2}\overline x + {{\overline x }^2}} \right) + ...\left( {{x_n}^2 - 2{x_n}\overline x + {{\overline x }^2}} \right) \cr} $$

At which point I felt I was close enough to start regrouping the pieces:

$$\eqalign{ & {\sum\limits_{i = 1}^n {\left( {{x_i} - \overline x } \right)} ^2} = \left( {{x_1}^2 + {x_2}^2 + ...{x_n}^2} \right) + \left( {{{\overline x }^2} + {{\overline x }^2} + ... + {{\overline x }^2}} \right) - 2\overline x \left( {{x_1} + {x_2} + ... + {x_n}} \right) \cr & {\sum\limits_{i = 1}^n {\left( {{x_i} - \overline x } \right)} ^2} = \sum\limits_{i = 1}^n {{x_i}^2} + n{\overline x ^2} - 2\overline x \left( {{x_1} + {x_2} + ... + {x_n}} \right) \cr} $$

And that's where I stuck. I can get the $ + n{\overline x ^2}$ to be $ - n{\overline x ^2}$, and I don't know how to "get rid" of the third term.

3

There are 3 best solutions below

1
On BEST ANSWER

You are almost done. Just write $$ \begin{align} {\sum\limits_{i = 1}^n {\left( {{x_i} - \overline x } \right)} ^2} &= \sum\limits_{i = 1}^n {{x_i}^2} + n{\overline x ^2} - 2\overline x \left( {{x_1} + {x_2} + ... + {x_n}} \right)\\&=\sum\limits_{i = 1}^n {{x_i}^2} + n{\overline x ^2} - 2\overline x \overline xn\\&=\sum\limits_{i = 1}^n {{x_i}^2} + n{\overline x ^2} - 2n\overline x^2\\&=\sum\limits_{i = 1}^n {{x_i}^2} - n{\overline x ^2}. \end{align}$$

0
On

$${\sum\limits_{i = 1}^n {\left( {{x_i} - \overline x } \right)} ^2} = $$

$$ \sum\limits_{i = 1}^n ({x_i}^2 - 2x_i\bar x+ \bar x^2)=$$

$$\sum\limits_{i = 1}^n ({x_i}^2) -2\bar x\sum\limits_{i = 1}^n x_i+n\bar x^2 = $$

$$\sum\limits_{i = 1}^n ({x_i}^2)-2\bar x(n\bar x)+n\bar x^2=$$

$$\sum\limits_{i = 1}^n ({x_i}^2)-2n\bar x^2+n\bar x^2=$$

$$\sum\limits_{i = 1}^n ({x_i}^2)-n\bar x^2$$

0
On

When both sides are divided by $n-1,$ the right-hand side is called the 'computational formula' for the sample variance (as opposed to the 'definition' on the left) because it has various computational advantages.

Just so there will be a derivation that is easy to follow for the next visitor, let me give one continued equation with the key relationships. (For simplicity all sums are taken over $i = 1, 2, \dots, n.)$

$$\begin{align} {\sum (X_i - \bar X)^2} &= {\sum(X_i^2 - 2\bar X X_i + \bar X^2)} \\&= {\sum X_i^2 - 2\bar X\sum X_i + n\bar X^2} \\&= \sum X_i^2 - 2\bar X(n\bar X) + n\bar X^2 \\&= \sum X_i^2 - n\bar X^2 = \sum X_i^2 - \frac{(\sum X_i)^2}{n}. \end{align}$$

Notes on several advantages of the 'computational' form:

(1) The right-hand side has one subtraction and the left-hand side has $n$ subtractions.

(2) Suppose you have a calculator in which observations $X_i$ are entered sequentially, each entry followed by pressing a key (perhaps "Data" or "$\Sigma^+$"). After each keypress: Memory A increments by $1,$ keeping track of the number of observations; Memory B increments by $X_i,$ keeping a running total; Memory C increments by $X_i^2.$ Then, when data entry is finished, the formula for the numerator of the sample variance is $C - B^2/A.$ Using the definition, the calculator would have to keep track of all $n$ observations to get $\bar X$ and then use each observation again to get $\sum (X_i - \bar X)^2.$

(3) Suppose you have Sample 1 with sample size $n_1,$ sample mean $\bar X_1,$ and sample variance $S_1^2$ known. Similarly, for Sample 2 you know $n_2, \bar X_2,$ and $S_2^2.$ Several uses of the computational formula allow you to find the sample size, mean, and variance of the combined sample, even if the original data are not available. [From $S_1^2, \bar X_1, n_1$ you can find $\sum_{[1]} X_i$ and $\sum_{[1]} X_i^2$ for the first sample. Similarly, for the second sample. Then you can find the corresponding sums for the combined sample, and finally the mean and variance for the combined sample.]