Proving that the estimate of a mean is a least squares estimator?

1k Views Asked by At

I think this is a really simple question so please bear with me - I just had my first class in regression and I'm a little confused about nomenclature/labeling.

Does anyone recommend some good weblinks that explain beginning linear regression really well?

There's a question I've been looking at for a while and I'm not sure how to do it (although I'm sure the solution is simple):

Show that the sample estimate $\hat{\mu}(X) = \frac{1}{n} \sum X_i$ is a least square estimator of $\mu$ for a variable $X$ given $X_1, \ldots, X_n$.

My first thought was,

$\mathrm{SSE} = \sum (\mu - \hat{\mu})^2$

But I'm not sure if thats right. I'm confused about what the beta is (is it n?) and I don't know if there are enough parameters to expand it.

Thanks so much for your patience and if this doesn't make sense, I can clarify more. Thanks!

2

There are 2 best solutions below

0
On BEST ANSWER

Now let $m$ be the mean and $a$ any estimate.

$$ \sum (x_i -a)^2 = \sum \left( (x_i-m + m -a)^2 \right) =\sum \left( (x_i-m)^2 + 2 (x_i-m) (m-a) + (m-a)^2 \right)\\ =\sum (x_i-m)^2 + 2 (m-a) \sum (x_i-m) + \sum (m-a)^2 $$ Now the basic property of the mean is $\sum(x_i-m) =0$. So $$ \sum (x_i -a)^2 =\sum (x_i-m)^2 + N (m-a)^2 $$ where $N$ is the number of data points. Clearly the first summation does not depend on $a$. The second is always non negative and is zero when $m=a$. So the best estimate is $a=m$.

This also shows that the minimum is $\sum (x_i-m)^2 $ which is the variance.

3
On

Least squares estimator is an estimator that minimizes the sum of the squares of the deviation from your observation to the estimate. This means you are seeking a $\hat{\mu}$ that solves the following problem $\min\{\sum{(X_i-\hat{\mu})^2}\}$. Now if you have somewhat familiarity with calculus the following will make total sense, if not I suggest you read about derivatives and what are known as the First and Second Order Conditions. In a nutshell, they say that in an unconstrained optimization problem, if the objective function is differentiable (you can take the derivative(s)) a solution is a vector s.t. first derivative of the function w.r.t any component of the vector you're optimizing on, evaluated at solution is zero SOC tells you whether what you just found is a maximum or a minimum. In this case the objective function is differentiable. Hence you have that $2\cdot\sum{(X_i-\hat{\mu})=0}$ which is the same as saying that $\sum X_i - N \hat{\mu}=0$ implying that $\hat{\mu} = (1/N)\cdot\sum X_i$