I have just started studying simple linear regression. This concerns Section 9.1 in the book 'Introduction to probability and statistics for engineers ans scientists' by Sheldon Ross 10th Edition. It says:
A simple linear regression model supposes a linear relationship between the mean response and the value of a single independent variable. It can be expressed as $$Y =\alpha+\beta x+e$$ where $x$ is the value of the independent variable, also called the input level, $Y$ is the response, and $e$, representing the random error, is a random variable having mean $0$. Suppose that the responses $Y_i$ corresponding to the input values $x_i , i = 1, . . . , n$ are to be observed and used to estimate $\alpha$ and $\beta$ in a simple linear regression model.
To determine estimators of $\alpha$ and $\beta$ we reason as follows: If $A$ is the estimator of $\alpha$ and $B$ of $\beta$, then the estimator of the response corresponding to the input variable $x_i$ would be $A + Bx_i.$
To specify the distribution of the estimators $A$ and $B$, it is necessary to make additional assumptions about the random errors aside from just assuming that their mean is $0$. The usual approach is to assume that the random errors are independent normal random variables having mean $0$ and variance $\sigma^2$. That is, we suppose that if $Y_i$ is the response corresponding to the input value $x_i$ , then $Y_1 , . . . , Y_n$ are independent and $$Y_i\sim\mathcal(\alpha+\beta x_i,\sigma^2)$$.
My questions are:
- Why are $x_i$ not being considered as independent random variables? Do we not consider $x_i$'s as sample data with an underlying distribution?
- Why is the error being called random? Why is it a random variable? What is its domain? And why is it normally distributed?
If you can understand what my confusion is about, can you also please explain using examples.
The answer to both of your questions is "because that's how we're defining the model". We are assuming that we are able to observe $x_i$ and $y_i$, and that if we know $x_i$ for some record then we are able to predict $y_i$ with some amount of error.
So for the purposes of this model, it doesn't matter whether the $x_i$ are fixed or random, because our predictions are always going to be conditioned on the value of $x_i$ regardless. (If you wanted to model the $x_i$ as a random variable and make some inference about the underlying model of the $y_i$ then that's a separate step.)
As for the error terms $\varepsilon_i$ being i.i.d. normal, again that's just an assumption. The idea is that once we remove the effect of the $x_i$, we want to assume that the remaining error is pure white noise with no structure.
Ultimately this comes down to an old adage - "No model is correct, but some models are useful."