In the book, Machine Learning in Finance: From Theory to Practice(Matthew F. Dixon, et al, 2020), there is a bayesian equation derivation in Chapter2:
$p(\theta|y, X) = {p(\theta, y | X) \over p(y|X)} = {p(\theta)p(y|\theta, X) \over p(y|X)}$
I have two questions about this:
How can I interpret $p(\theta|y, X)$? Should I see it as $p(\theta|(y \cap X))$ or $p((\theta|y), X)$?
How is the above equation derived? I have no idea how $p(\theta|y, X)$ goes to ${p(\theta, y | X) \over p(y|X)}$ (why is it normalized(?) by X on both nominator and denominator? and why $p(y|X)$ should be denominator?, etc), and how ${p(\theta, y | X) \over p(y|X)}$ is converted to
${p(\theta)p(y|\theta, X) \over p(y|X)}$ suddenly(How can the $p(\theta)$ came out of $p(\theta, y | X)$?, etc)