Maximum likelihood estimation steps

96 Views Asked by At

If we have the linear model $$ y = \beta^Tx + \epsilon $$ and assuming $\epsilon \sim N(0, \sigma^2)$, we can write \begin{align} p(\epsilon) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left({-\frac{(\epsilon)^2}{2\sigma^2}}\right) \end{align}

Since we know $\epsilon = y - \theta^Tx$, we can write

\begin{align} p(y - \theta^Tx) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left({-\frac{(y - \theta^Tx)^2}{2\sigma^2}}\right) \end{align}

and apparently the above is equivalent to writing \begin{align} p(y|x; \theta) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left({-\frac{(y - \theta^Tx)^2}{2\sigma^2}}\right) \end{align}

I am confused with going from the 2nd to last to last equation. Why is it that we can turn the marginal probability distribution of $\epsilon$ into a conditional distribution of $y$ given $x$?

Why can't the left hand side of the question be a joint probability, e.g., $p(y,x; \theta)$ instead or even $p(x | y; \theta)$?

1

There are 1 best solutions below

2
On BEST ANSWER

$\epsilon \sim N(0,\sigma^2) \implies y \sim N(\beta^Tx,\sigma^2)$ since we generally assume $x$ is fixed in linear regression: $E[y|x] = \beta^Tx$.

This means that $y$ is simply a translated version of $x$ which makes it trivial to get the conditional distribution of $y$. Note that $x$ is generally considered fixed so a joint distribution doesn't make sense.

In general, it's tempting to do "normal algebra" with random variables, but that doesn't always work out like you think. You need to be very clear about what is random and what is not and not treat them all on the same footing.