I have a question on the definition of posterior probability as defined on Wiki:
a) $$ p(\theta|x) = \frac{p(x|\theta)}{p(x)}p(\theta) $$ where $p(x)$ is the normalizing constant and is calculated as
b) $$ p(x) = \int p(x|\theta)p(\theta)d\theta $$ for continuous $\theta$, or by summing $p(x|\theta)p(\theta)$ over all possible values of $\theta$ for discrete $\theta$ $$
Question How is (b) derived from (a)? I get: $$ p(x) = \frac {p(x \cap \theta) p(\theta) p(x)} {p(\theta) p(x \cap \theta)} = p(x|\theta) \frac {p(\theta)}{p(\theta|x)} $$
I'm confused how this equals the integral (b).
It is the definition of law of total probability for random variables. Note that $p(x|\theta)p(\theta)=p(x,\theta)$, the joint distribution. Integrating out $\theta$ leaves us with just $\int_\Omega p(x,\theta)d\theta=p(x)$ if $\theta$ is continuous and $\sum_\Omega p(x,\theta)=p(x)$ if $\theta$ is discrete with $\Omega$ the parameter space.