Linear regression throug the origin versus mean?

161 Views Asked by At

Assume that I have data that can be described by:

$y_i = \beta x_i + \epsilon_i, \epsilon_i \sim (0,\sigma_{\epsilon})$,

then the least squares estimator is given by

$\hat{\beta_1} = \frac{\sum_{i=1}^N x_iy_i}{\sum_{i=1}^N x_i^2}$.

Why is it wrong to use the following estimator?

$\hat{\beta_2} = \frac{1}{N}\sum_{i=1}^N \frac{y_i}{x_i}$.

Do they not estimate the same parameter?

3

There are 3 best solutions below

3
On BEST ANSWER

The ordinary least squares estimator, $\widehat \beta_{OLS}=\sum \frac{x_i y_i}{x_i^2}$, as others have mentioned minimizes the sum of squares error $\widehat \beta_{OLS}=argmin_b \sum(y_i -bx_i)^2$. The major reason that it is so widely used is because it is BLUE, best linear unbiased estimator (http://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem) i.e. among all unbiased estimators is has lowest variance ($var(\widehat \beta_{OLS})$ is smallest among all unbiased estimators). Hence, your estimator is unbiased but $\widehat \beta_{OLS}$ is also unbiased and has lower variance.

There are other estimators that are sometimes thought to be better than $\widehat \beta_{OLS}=\sum \frac{x_i y_i}{x_i^2}$. For example, an estimator that minimizes the absolute difference $\widehat \beta_{LAD}=argmin_b \sum |y_i -bx_i|$ is less sensitive to outliers http://en.wikipedia.org/wiki/Least_absolute_deviations.

Actually, I think the estimator $\widehat \beta_1=\frac{1}{N}\sum \frac{y_i}{x_i}$ is not a very good one because it is sensitive to small values of $x_i$ while the other estimator are not... if anything I think $\widehat \beta_2=\frac{\sum y_i}{\sum x_i}$ is better.... Also it is hard to see how to extend the estimator $\widehat \beta_1$ to a multivariate regression.

Let's calculate the variances. Assume for simplicity that the $x_i$ are nonrandom. Then, $$var(\widehat \beta_1)=var(\frac{1}{N}\sum \frac{y_i}{x_i}) =\frac{1}{N^2}var( \sum \frac{(x_i\beta+\varepsilon_i)}{x_i}) =\frac{\sigma^2_{\varepsilon}}{N^2} \sum \frac{1}{x_i^2} \\ var(\widehat \beta_2)=var( \frac{\sum y_i}{\sum x_i})=var( \frac{\sum(x_i\beta+\varepsilon_i)}{\sum x_i}) = \frac{N\sigma^2_{\varepsilon}}{(\sum x_i)^2} \\ var(\widehat \beta_{OLS})=var( \frac{\sum x_iy_i}{\sum x_i^2}) =var( \frac{\sum x_i(x_i\beta+\varepsilon_i)}{\sum x_i^2}) =var( \frac{\sum x_i\varepsilon_i}{\sum x_i^2}) = \frac{\sigma^2_{\varepsilon} }{\sum x_i^2}$$ we can see that the issue with $\widehat \beta_1$ is when $x_i$ is small... I'll let you apply Cauchy-Swartz to actually prove that $var(\widehat \beta_{OLS})\leq var(\widehat \beta_{1})$.

2
On

It's not 'wrong' to use that estimator, which is the mean of the ratios given by each point.

But they are not the same; the first one is the unique value of $\beta$ which minimizes

$$ E(\beta) = \sum_i \left( \beta x_i - y_i \right)^2 $$ and will give a different result in the general case.

In the case where all the $x_i, y_i$ can be exactly fitted with a specific $\beta$ value, both estimates will yield that value.

Qualitatively, $\beta_2$ will tend to give larger fitting errors for points with larger values of $x$, and a better fit for smaller values, compared to $\beta_1$

Also, in the case where all your $x$ fall in a narrow relative range, e.g. $100 \le x \le 104 $; so the ratio $\max (|x_i|) / \min(|x_i|)$ is not much larger than 1: there will be very little difference between the two estimates, even when the points don't fall close to a straight line.

3
On

The estimator you define also makes sense. But like others said this is the different estimator. Actually it solves the different problem: $$ y_i/x_i = \beta +\epsilon_i $$ and minimizes the different sum of squares $\sum_i(y_i/x_i-\beta)^2$. Of course $x_i$ cannot be zero in such case. So the main idea under choosing the estimator what error exactly you want to minimize.