I can see why linear regression is linear, i.e., because it is represented by a line, but what does regression have to do with the term as a whole?
What is the meaning this word contributes to the term?
I can see why linear regression is linear, i.e., because it is represented by a line, but what does regression have to do with the term as a whole?
What is the meaning this word contributes to the term?
On
Semi-intuitive explanation:
Assume that we are interested in one's IQ. Not in a score that he might get in some IQ test, rather in his true IQ value. So, we have to assume that there exists such a value. Lets denote it by $\mu$. However, it is impossible to measure it directly. As such, we can use IQ tests to estimate it. Denote by $X_i$ his score in the $i$th test. We can model this score by $X_i = \mu +\epsilon_i$, where $\epsilon_i$ is the random error of the $i$th test with $\mathbb{E}\epsilon_i = 0$ and $var(\epsilon_i) = \sigma^2$ . That is, the score of the $i$th test is composed of his real value (signal) and some random error (noise). Because $\mathbb{E}X_i=\mu$, his score will (in some sense) tend to his real IQ value. Therefore, after $n$ such tests we will take his average score as the estimator of this value. This average will indeed tend to $\mu$ in a sense that as larger the number of tests that he takes - the more accurately will the sample average estimate the real IQ.
What exactly is the random error is more philosophical rather than statistical question. That is, it may stem from some imperfections of the tool (test) that may vary in its difficulty or it may stem from some subject related factors (tiredness, mood etc.) or even may be some inherent property of the IQ itself (i.e., it is not a scalar but rather a random variable. Then you may be interested in its mean value.). Or some combination of the above.
Formally:
The linearity of a regression model does not mean that it is straight line. Namely, any model that can be written in matrix notations as
$$
Y=X\beta +\epsilon,
$$
is called linear. Special cases like $y=\beta x +\epsilon$ and $y=\beta_0 + \beta_1x +\epsilon $ are indeed estimated by straight lines. Or can be viewed as straight lines (signal) that is interrupted by some noise $\epsilon$. Linearity is defined by independence of the first derivative of $Y$. I.e., if $\partial y / \partial \beta_j = x_j$, $j=1,...,p$ with $x_1 = 1$, then it is linear. If at least one of the derivatives depends on the parameters, then it is non-linear. While, the observations may be interpret as random fluctuations around the mean $\mathbb{E}Y = X\beta$, where $\mathbb{E}\epsilon = 0$. The estimation methods are try to estimate this means by minimizing (mostly) the squared error, i.e., the (squared empirical) deviation from this unknown mean.
The idea of linear regression is to infer from dataset the dependence of some random variable in other variable, what the term regression mean in this context is that from the dataset we regress to some kind of dependency that brought us to that data.