Question on Linear Regression

28 Views Asked by At

Given a data set $\{(x_i, y_i) \mid i = 1, \cdots, n\} \subset \mathbb R^2$, I want to minimize the cost function

$J(a,b) = \sum_{i=1}^n (h(x_i)-y_i)^2$, where $h(x) = ax + b$.

Here, each term $|h(x_i) - y_i|$ measures the distance between $h$ and $(x,y)$ along $Y$-axis. (intuitively, this corresponds to a vertical line from $(x_i, y_i)$ to $h$). Getting $h$ is not a big deal; just take a derivative, and call this function $H_1$.

However, I have a feeling that I should fit the linear regression which minimizes the orthogonal distance between data set and regression line. (Such regression line can be obtained by taking a sequence of functions $h_1, h_2, \cdots$. It is clear that $h_n \rightarrow H_2$ pointwise for some linear function $H_2$.)

My questions are:

  1. $H_1 = H_2$?
  2. If they are different, what's the philosophical reason that we take $H_1$ instead of $H_2$? Is it simply because of easier computation?