Variance of normal distributed least squares estimation

758 Views Asked by At

I'm using Least Squares Estimation to find the parameter $a$ in the formula $Y = a * X$ .

If $Y $ ~ $N(a * X, sigma)$, how can I calculate $sigma$?

I think $sigma$ is not the same as the variance of parameter $a$. I hope to hear your thoughts.

As seen in the picture below, this is the normal distribution I mean.

enter image description here

1

There are 1 best solutions below

13
On BEST ANSWER

Your model is $Y \sim \mathcal{N}( a x, \sigma)$, where there's only one "parameter" (the slope $a$) to be fitted in the linear regression. The underlying population variance $\sigma^2$ is to be estimated.

Denote the observations as $y_i$ (for $i = 1,2,\ldots, n$) and the sample mean as $\bar{y} \equiv \frac1{n} \sum_{i = 1}^n y_i~$. The observations are made at positions (horizontal coordinate) $x_i$ where the average coordinate is deonted $\bar{x} \equiv \frac1{n} \sum_{i = 1}^n x_i~$.

The data is NOT centered: $~~\bar y \neq 0$ and $\bar x \neq 0$ in general.

A commonly used unbiased estimator $\widehat{\sigma^2}$ is the following: (the hat indicates it is an estimator)

$$\widehat{\sigma^2} = \frac1{n - 1} \sum_{i=1}^n \Bigl[ (y_i - \widehat{a} \,x_i )^2 \Bigr] ~~,~~ \text{where}~~\widehat{a}\equiv \dfrac{~~\sum_{i=1}^n x_iy_i ~~}{\sum_{i=1}^n x_i^2} $$

This expression of $\widehat{\sigma^2}$ can be rewritten into several other forms, but none of them are really "shorter" or as intuitive as the original expression of plugging-in-the-MLE. Numerical stability of doing nested or iterated sum of squares is not really an issue in most applications.

If you find the relevant wiki entry hard to digest, then maybe try some widely used textbooks like An Introduction to Statistical Learning with R by James (and 3 other authors) or the book by Weisberg.

Note that if in the textbook you see the denominator $n-2$ in some places, it is because when there are two parameters to be fitted (the slope AND the intercept). Here the intercept is a known constant (zero) and there is only one parameter to be fitted. In wikipedia, this is denoted as $n-p$ where $p$ is the number of parameters to be fitted.


The above is the an unbiased estimator for the variance $\sigma^2$, and if you want the standard deviation $\sigma = \sqrt{ \sigma^2}$, you can directly take the square root of the above

$$\widehat{\sigma} = \sqrt{ \widehat{\sigma^2} }$$

The fact that this $\widehat{\sigma}$ would be biased is well-known and often NOT a deal breaker, depending on your application.