Let us assume that the target variables and the inputs are related via the equation:
$y^{(i)}=w^Tx^{(i)} + e^{(i)}$
where $e^{(i)}$ is an error term that captures either unmodeled effects, or random noise. Let us further assume that the $e^{(i)}$ are distributed IID according to a Gaussian distribution with mean zero and some variance $σ^{2}$. Thus:
$p(e^{(i)})=\frac{1}{\sqrt{2π}σ}exp(-\frac{(e^{(i)})^2}{2σ^2})$
My question is, how does this imply that the probability of $y^{(i)}$ given $x^{(i)}$ and parameterized by $w$ is the following:
$p(y^{(i)}|x^{(i)}, w)=\frac{1}{\sqrt{2π}σ}exp(-\frac{(y^{(i)} - w^Tx^{(i)})^2}{2σ^2})$
You are viewing $w_j$, $i=j,..,p$ as unknown constants and $\mathrm{x}_i$ is given, thus denote $(X_1 = x_1,..,X_p = x_p) = X$, so $$ \mathbb{E}[y_i|X] = \mathbb{E}[w^T\mathrm{x}_i + e_i|X]= w^T\mathrm{x}_i + \mathbb{E}[e_i|X] =w^T\mathrm{x}_i. $$ Same for the variance, i.e., $$ \operatorname{Var}[y_i|X] = \operatorname{Var}[w^T\mathrm{x}_i + e_i|X]= \operatorname{Var}[e_i|X] =\sigma^2. $$ And $y_i$ given $X$ is a linear compilation of a normal r.v. $e_i$, thus $$ y_i|X \sim \mathcal{N}(w^T\mathrm{x}_i, \sigma^2). $$