how to understand the conditional probability point of view of linear regression?

Question

how to understand the conditional probability point of view of linear regression?

934 Views Asked by Bumbble Comm At 29 Mar 2026 - 9:21

in Andrew Ng's Machine learning course http://cs229.stanford.edu/notes/cs229-notes1.pdf

Page 12,it said: $x$ and $y$ has linear relationship $$y(i) = \theta^Tx^{(i)}+\epsilon^{(i)}$$ where $$ \epsilon \sim N(0,\sigma^2)$$ is assumed to be IID random Gaussian variable.Therefore $$ p(\epsilon^{(i)}) = \frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(\epsilon^{(i)})^2}{2\sigma^2})$$ and if we write $$\epsilon^{(i)} = y(i) - \theta^Tx^{(i)}$$ we can obtain a conditional probability: $$ p(y^{(i)} | x^{(i)};\theta) = \frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y(i) - \theta^Tx^{(i)})^2}{2\sigma^2})$$

My question is : why it's an conditional probability? rather than a 2D-Joint probability? That is, why can't I write it as this: $$ p(y^{(i)} , x^{(i)};\theta) = \frac{1}{\sqrt{2\pi}\sigma}exp(-\frac{(y(i) - \theta^Tx^{(i)})^2}{2\sigma^2})$$

one more question, in the notes, $y^{(i)}$ and $x^{(i)}$ are samples, not random variants, was it OK to write $ p(y^{(i)} | x^{(i)} )$ instead of $ p(Y | X)$

Thank you Brilliant guys!

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2016-06-01 09:02:26

$p(\epsilon^{(i)})$ is the probability density function of the error term $\epsilon^{(i)}$. It expresses how the unmoddled effects influence the value of $y^{(i)}$; that is the density function of the influence on $y^{(i)}$ that is not produced by random variable $x^{(i)}$ (or the parameter $\theta$).

In other words, for a fixed value of $x^{(i)}$, then $p(\epsilon^{(i)})$ is the conditional density function of $y^{(i)}$ given that contraint.

$$p(y^{(i)}\mid x^{(i)};\theta)~=~p(\epsilon^{(i)})$$

one more question, in the notes, $y^{(i)}$ and $x^{(i)}$ are samples, not random variants, was it OK to write $ p(y^{(i)} | x^{(i)} )$ instead of $ p(Y | X)$

Yes..ish. It is actually shorthand for $p_{\lower{0.5ex}{X\mid Y}}(y^{(i)}\mid x^{(i)})$, just as the $p(\epsilon^{(i)})$ is actually $p_{\lower{0.5ex}\mathcal E}(\epsilon^{(i)})$ . It is just that the subscripts are sometimes omitted when it is considered unambiguous as to what variable the value refers.

how to understand the conditional probability point of view of linear regression?

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions