Understanding probabilistic interpretation of linear regression

179 Views Asked by At

I want to fully understand the probabilistic interpretation. As in, I know once we have a probabilistic model, we differentiate for maximum likelihood and find the weights/regressors but what i really find difficult to grasp is how exactly are we developing a probabilistic model for linear regression. I have see that initially we we write: $y_i=\epsilon_i +w^Tx_i$.---------(1)
Here i want to know what is $y_i$? Is it the observed value? Then how come we model it as random? Where is the randomness coming from? What is $\epsilon_i$? Is it error or noise?

please correct if i am wrong:

What i understand is our measured data is noisy. i.e for the same $x_i, y_i$ can vary on a different draw of samples, which is due to some inherent randomness in $y_i$. And this randomness is what we are quantifying using $\epsilon_i \sim N(0,\sigma^2)$. Hence $y_i$ is a normal random variable given $x_i$ and it has mean $w^Tx_i$ so we want to maximize the likelihood, meaning maximize the probability that $y_i$ takes the value which we have in our current experimental data, given xi and this probability happens to be parameterized by w due to (1).

2

There are 2 best solutions below

1
On

Your two questions cancel each other out: the randomness is coming from $\varepsilon_i$. In the probabilistic interpretation of linear regression we assume that the data is being generated by an unknown linear model $w^T x_i$ plus IID Gaussian noise $\varepsilon_i$, which we assume has mean zero and a fixed variance $\sigma^2$ which will turn out to be irrelevant. This means that the log likelihood of observing some data points $y_1, \dots y_k$ under the assumption that the weight vector is $w$ is proportional to $- \sum (y_i - w^T x_i)^2$, so the maximum likelihood estimate for $w$ is given by maximizing this expression, which means minimizing the sum of squares.

1
On

In this regression model $x_i$ is deterministic, and not noisy. It is the point at which you perform the measurement. $y_i$ is the noisy result of the measurement and is indeed the observed value. Sometimes I like to think of it as if the true measurement $y^{true}_i$ is viewed through an instrument that is noisy, so that each time we make the observation at the same point $x_i$, the instrument gives us a different value of $y_i$ that is near, but not at, the true value $y^{true}_i$. So you can interpret $\epsilon$ as the random behavior of the instrument, which is different at each observation (even if the observation is at the same $x$).