How does the probabilistic interpretation of least squares for linear regression works?

532 Views Asked by At

Let us assume that the target variables and the inputs are related via the equation:

$y^{(i)}=w^Tx^{(i)} + e^{(i)}$

where $e^{(i)}$ is an error term that captures either unmodeled effects, or random noise. Let us further assume that the $e^{(i)}$ are distributed IID according to a Gaussian distribution with mean zero and some variance $σ^{2}$. Thus:

$p(e^{(i)})=\frac{1}{\sqrt{2π}σ}exp(-\frac{(e^{(i)})^2}{2σ^2})$

My question is, how does this imply that the probability of $y^{(i)}$ given $x^{(i)}$ and parameterized by $w$ is the following:

$p(y^{(i)}|x^{(i)}, w)=\frac{1}{\sqrt{2π}σ}exp(-\frac{(y^{(i)} - w^Tx^{(i)})^2}{2σ^2})$

1

There are 1 best solutions below

0
On

You are viewing $w_j$, $i=j,..,p$ as unknown constants and $\mathrm{x}_i$ is given, thus denote $(X_1 = x_1,..,X_p = x_p) = X$, so $$ \mathbb{E}[y_i|X] = \mathbb{E}[w^T\mathrm{x}_i + e_i|X]= w^T\mathrm{x}_i + \mathbb{E}[e_i|X] =w^T\mathrm{x}_i. $$ Same for the variance, i.e., $$ \operatorname{Var}[y_i|X] = \operatorname{Var}[w^T\mathrm{x}_i + e_i|X]= \operatorname{Var}[e_i|X] =\sigma^2. $$ And $y_i$ given $X$ is a linear compilation of a normal r.v. $e_i$, thus $$ y_i|X \sim \mathcal{N}(w^T\mathrm{x}_i, \sigma^2). $$