I am reading Andrew Ng's notes on linear regression, and in this section, he attempts to derive the formula for the least squares using a probability approach: http://cs229.stanford.edu/summer2020/cs229-notes1.pdf
We assume: $$y^{(i)}=\theta ^{T}x^{(i)}+e^{(i)}$$
where where $\epsilon$ is distributed according to the normal distrubtion: $N(0, \sigma^2)$
Thus, we can find the probability of $e^{(i)}$ as $$p(e^{(i)})=\frac{1}{\sqrt{2π}σ}exp(-\frac{(e^{(i)})^2}{2σ^2})$$
However, in the next step, he says that this implies $$p(y^{(i)}|x^{(i)}; \theta)=\frac{1}{\sqrt{2π}σ}exp(-\frac{(y^{(i)} - \theta^Tx^{(i)})^2}{2σ^2})$$
and I am not sure how.
We see that $e$ is a RV and is distributed according the normal, so then how can we calculate the probability on something that isn't $e$?
If $\theta$ and $x^{(i)}$ are fixed, then you are adding a constant to the normal distribution of $e^{(i)}$. This addition changes the center of the distribution but not the scale, so $y^{(i)}$ follows the distribution of $e^{(i)}$ but centered at $\theta^\top x^{(i)}$. In probability notation, this is $$y^{(i)}|x^{(i)},\theta \sim \mathcal{N}(\theta^\top x^{(i)}, \sigma^2).$$