Consider two dependent variables $(u, y)\in X\times Y$ with sigma algebras $\mathcal X$ and $\mathcal Y$ on $X$ and $Y$, respectively. There are many source for how to construct a conditional distribution of from a joint distribution on the product sigma algebra but I am interested in the reverse direction.
In short: Given a functional probabilistic relation between a parameter $u$ and data $y$; and a prior probability on the parameters $u$, how do I rigorously define a joint probability distribution of $(u,y)$. I know that we can formally calculate $\mathbb P(u\in A, y\in B) = \int_A \mathbb P(y\in B|u) \mu_0(\mathrm d u)$, but how is this done "properly"?
Let's propose a "prior" measure $\mu_0$ on $(X,\mathcal X)$ and a regular version of a conditional probability distribution for $y|u$, i.e. a function $G: X\times \mathcal Y \to [0,\infty)$ such that
- for all fixed $u_0$, the map $B\mapsto G(u_0,B)$ on $\mathcal Y$ is a probability distribution,
- for all fixed $B\in \mathcal Y$, the map $u\mapsto G(u, B)$ is measurable on $(X, \mathcal X)$ and
- for all $B\in\mathcal Y$, we have $\int_A G(u, B) \mu_0(\mathrm d u) = \mathbb P(u\in A, y\in B)$
Now the last point troubles me a bit: The identity seems to constitute a joint probability measure $\mathbb P$ on the product space $(X\times Y, \mathcal X\otimes \mathcal Y)$. But:
- How do I know that this is actually a proper probability distribution? (OK, this is probably done by showing countable additivity, measure 0 on the empty set and measure 1 on the full set, BUT: I only have an expression for rectangular subsets $A\times B \in \mathcal X\otimes \mathcal Y$, is that sufficient to define a joint probability on $X\times Y$?
- Actually, the notion of conditional probabilities REQUIRES a joint probability distribution on both variables to start with, so this approach cannot really work because the snake bites its own tail: I am implicitly requiring a joint probability in order to define a conditional probability in order to define a joint probability measure?
So, how can it be done then? How do I start with a measure $\mu_0(\mathrm d u)$ on $(X,\mathcal X)$ and some notion of conditional distribution $y|u$ (but beware the notion of regular conditional already assumes existence of a joint probability distribution) in order to correctly define a joint probability measure on $(X\times Y, \mathcal X\otimes \mathcal Y)$ such that the construction of the regular conditional distribution returns the same one as I started with?