Is there a reliable reference that shows that Bayes theorem holds for measures, densities, masses, or combinations of these?

174 Views Asked by At

I am looking for a book or a reliable reference that I can site in a research paper or thesis that shows that Bayes' theorem/rule holds for both probability measures (i.e. events), densities, mass function, and combinations of densities and mass functions (depending on whether the involved random variables are continuous or discrete)? Something like these notes (even more detailed or formal, i.e. it's ok that the book is based on measure theory, provided that it shows these things), but that is a reliable book or research paper that I can cite in a research paper.

Please, do not tell me to derive these things myself, because my knowledge of measure theory is not very good (and I don't have time now to study it), or that I can cite the notes, because I am looking for a reliable book or research paper. Most books I have looked at only present Bayes theorem by considering measures and events.

2

There are 2 best solutions below

2
On

Bayes' Rule is of course just an application of the definition of conditional probability/ density. But if you'd like something to refer to, it is stated for general distributions (i.e., encompassing both "densities" and "mass functions") in the introduction (see p. 7 of 2nd edition) of Bayesian Data Analysis by Gelman, Carlin, Stern, and Rubin.

7
On

As announced in the comment, I'd cite Theorem 1.31 in this book. It says:

Suppose that $X$ has a parametric family $\mathcal{P}_0$ of distributions with parameter space $\Omega$. Suppose that $P_\theta\ll \nu$ for all $\theta\in\Omega$, and let $f_{X|\Theta}(x|\theta)$ be the conditional density (with respect to v) of X given $\Theta = \theta$. Let $\mu_\Theta$ be the prior distribution of $\Theta$. Let $\mu_{\Theta|X}(\cdot|x)$ denote the conditional distribution of $\Theta$ given $X = x$. Then $\mu_{\Theta|X}\ll\mu_\Theta$, a.s. with respect to the marginal of $X$, and the Radon-Nikodym derivative is $$ \frac{d\mu_{\Theta|X}}{d\mu_\Theta}(\theta|x)=\frac{f_{X|\Theta}(x|\theta)}{\int_\Omega f_{X|\Theta}(t|\theta)\,d\mu_\Theta(t)} $$ for those $x$ such that the denominator is neither $0$ nor infinite. The prior predictive probability of the set of $x$ values such that the denominator is $0$ or infinite is $0$, hence the posterior can be defined arbitrarily for such $x$ values.

In your cases of interest $\nu$ is the Lebesgue measure.