Probabilistic interpretation linear regression implication step

63 Views Asked by At

I am reading Andrew Ng's notes on linear regression, and in this section, he attempts to derive the formula for the least squares using a probability approach: http://cs229.stanford.edu/summer2020/cs229-notes1.pdf

We assume: $$y^{(i)}=\theta ^{T}x^{(i)}+e^{(i)}$$

where where $\epsilon$ is distributed according to the normal distrubtion: $N(0, \sigma^2)$

Thus, we can find the probability of $e^{(i)}$ as $$p(e^{(i)})=\frac{1}{\sqrt{2π}σ}exp(-\frac{(e^{(i)})^2}{2σ^2})$$

However, in the next step, he says that this implies $$p(y^{(i)}|x^{(i)}; \theta)=\frac{1}{\sqrt{2π}σ}exp(-\frac{(y^{(i)} - \theta^Tx^{(i)})^2}{2σ^2})$$

and I am not sure how.

We see that $e$ is a RV and is distributed according the normal, so then how can we calculate the probability on something that isn't $e$?

1

There are 1 best solutions below

2
On

If $\theta$ and $x^{(i)}$ are fixed, then you are adding a constant to the normal distribution of $e^{(i)}$. This addition changes the center of the distribution but not the scale, so $y^{(i)}$ follows the distribution of $e^{(i)}$ but centered at $\theta^\top x^{(i)}$. In probability notation, this is $$y^{(i)}|x^{(i)},\theta \sim \mathcal{N}(\theta^\top x^{(i)}, \sigma^2).$$