Weird Bayes Rule Application, Wanted Effect is Also a Condition

66 Views Asked by At

I'm a bit confused after seeing this show up in my textbook. enter image description here

I've never really seen an application of Baye's Rule like this one - sorry for the lack of specificity in the question, but I don't really know what I should be asking. How do I interpret this? I'm more used to something like:

$$p(w | A, B \propto p(A, B|w)p(w) $$

where A, B appear as the same thing.

EDIT More Context.

This is in the context of Bayesian Linear Regression. The author is explaining how we can come to the same optimal answer for Linear Regression from a Bayesian standpoint. w is the set of weights in linear regression, X is a design matrix of examples ${x_1, x_2.... x_m}$.

We assume that the prior probability distribution of possible values of the weights vector $w$ follow some sort of Gaussian distribution, and we also assume that $p(y| X, w)$ follows a gaussian distribution as well, where the mean $\mu$ is equal to $Xw$ and the variance is the identity matrix $I$.

Textbook pages I am referring to are HERE, at pages 135-136.

Could someone explain how this is possible and why it makes sense? Thanks, A

1

There are 1 best solutions below

0
On BEST ANSWER

Thanks for adding the context! You're right that this isn't a typical form of Bayes's rule. It's true if we make a fairly natural independence assumption.

It's common to write Bayes's rule as $$p(B \mid A) \propto p(A \mid B) p(B)\text{.}$$

You can also write this conditioned on a third variable: $$p(B \mid A, C) \propto p(A \mid B, C) p(B \mid C)\text{.}$$

Using the variables from your regression model, we can write

$$p(w \mid y, X) \propto p(y \mid w, X) p(w \mid X)\text{.}$$

This is the equation from the textbook, except for the last term. We need $p(w \mid X)$ to equal $p(w)$ for the textbook equation to hold true. But we typically do make this independence assumption: our initial belief about the weights $w$ of the model doesn't depend on the covariates $X$.


For people looking this up later, the question is about equation (5.74) in Deep Learning by Goodfellow, Bengio, and Courville.