Defects of Least square regression in some textbooks

205 Views Asked by At

I only know, from some textbook, that we can do LSR is this way: enter image description here Since the text is too long, I am sorry that I cannot typeset it here.

This method only consider the errors of $y$. In real experiment, however, both $x$ and $y$ may have error. How can we perform a least square regression on both axis?

So the question is:

Suppose two variables $x,y$ are related by $y=ax+b$. In several experiments, error of $x$ and $y$ are random variables $\epsilon_x \sim N(0,\sigma_x), \epsilon_y \sim N(0,\sigma_y)$ respectively. How can we give a MLE on $a$ and $b$?

2

There are 2 best solutions below

0
On

Since you have stated that $y = ax+b$, it is unclear in your question where the errors come into your model. However, if you are saying that there is some error in the measurement of $x$ and $y$ then what you are referring to is a regression model with errors-in-variables. This subject has a large literature, but you can find an introduction here.

0
On

This is the Total Least Squares model.
In this model, as opposed to the classic Least Squares, the distance isn't the vertical distance between the measurements and the plane but the perpendicular distance.

The objective function is given by:

$$\begin{align*} \arg \min_{ \left[ E, e \right] } \quad & \left\| \left[ E, e \right] \right\|_{F} \\ \text{subject to} \quad & \left( A + E \right) x = b + e \end{align*}$$

Namely one seeks $ x $ , $ E $ and $ e $ that solves the optimization problem where $ E $ is for the model noise and $ e $ is the measurement noise.

This is actually closely related to Principal Component Analysis (PCA) Regression, which comes as no surprise since both are solved using the Singular Value Decomposition (SVD).