How can I derive posterior distribution of theta when having 2 R.V (X, y) using Bayesian Rule?

47 Views Asked by At

In the book, Machine Learning in Finance: From Theory to Practice(Matthew F. Dixon, et al, 2020), there is a bayesian equation derivation in Chapter2:

$p(\theta|y, X) = {p(\theta, y | X) \over p(y|X)} = {p(\theta)p(y|\theta, X) \over p(y|X)}$

I have two questions about this:

  1. How can I interpret $p(\theta|y, X)$? Should I see it as $p(\theta|(y \cap X))$ or $p((\theta|y), X)$?

  2. How is the above equation derived? I have no idea how $p(\theta|y, X)$ goes to ${p(\theta, y | X) \over p(y|X)}$ (why is it normalized(?) by X on both nominator and denominator? and why $p(y|X)$ should be denominator?, etc), and how ${p(\theta, y | X) \over p(y|X)}$ is converted to
    ${p(\theta)p(y|\theta, X) \over p(y|X)}$ suddenly(How can the $p(\theta)$ came out of $p(\theta, y | X)$?, etc)

1

There are 1 best solutions below

3
On BEST ANSWER
  1. $P(\theta \mid y, X)$ denotes the conditional probability/density of $\theta$ given $y$ and $X$. $p((\theta \mid y), X)$ is definitely wrong. $p(\theta \mid (y \cap X))$ is in the right spirit, but I would not use the $\cap$ notation because $y$ and $X$ are presumably random variables, not events.
  2. You should be familiar with $$p(\theta \mid y) = \frac{p(\theta, y)}{p(y)} = \frac{p(\theta) p(y \mid \theta)}{p(y)}.$$ This is simply the definition of conditional probability applied twice (and is often called Bayes's rule). What you have is a similar application, where each of the five terms appearing in this equation is also conditioned on $X$, i.e. $$p(\theta \mid y, X) = \frac{p(\theta, y \mid X)}{p(y \mid X)} = \frac{p(\theta \mid X) p(y \mid \theta, X)}{p(y \mid X)}.$$ If this is unclear to you, I would encourage you to just break every conditional probability down into unconditional probabilities, and check that the equalities work out i.e. $p(\theta \mid y, X) = \frac{p(\theta, y, X)}{p(y, X)}$, $p(y \mid X) = \frac{p(y, X)}{p(X)}$, and so on.
  3. Finally, the reason why $p(\theta \mid X)$ is replaced by $p(\theta)$ is probably due to some context that you did not provide in the question. I presume that $y$ depends on both $\theta$ and $X$, but $\theta$ does not depend on $X$.