How can I show that any two versions of $P(A|X)$ differ in a set of probability zero?

91 Views Asked by At

Given these two definitions(Probility - Breiman):

Definition 4.7. The conditional probality $P(A|X=x)$ is defined as any $\mathcal{B}_1 -$measurable function satisfying $$P(A,X\in B)=\int_B P(A|X=x) \hat{P}(dx)$$ all $B\in\mathcal{B}_1$.

Definition 4.8. The conditional probality of $A$ given $X(\omega)$ is defined as any random varible $\Omega$, measurable $\mathcal{F}(X)$, and satisfying$$P(A,X\in B)=\int_{\{X\in B\} } P(A|X) P(dx)$$all $B\in\mathcal{B}_1$

How can I show that any two versions of $P(A|X)$ differ in a set of probability zero?

1

There are 1 best solutions below

0
On

There's a significant amount of context missing from this question. While it's possible to fill in most of it by making reasonable guesses for the meanings of all the undefined symbols, I found some aspects of the quoted definitions sufficiently puzzling to consult the reference cited. Comparison of Breiman's definition $4.8$ with the version given here reveals that there's a typo or misquotation in the latter. The equation in the definition Breiman gives is $$ P(A,X\in B\,)=\int_{\{X\in B\ \}}P(A|X)\,\color{red}{dP},\ \ \ \text{all }\ B\in\mathcal{B}_1\\ $$ rather than with the $\ \color{red}{P(dx)}\ $, given in the question here, as the infinitesimal increments of measure with respect to which the integral is being taken. If the notation $\ P(dx)\ $ is not merely an inadvertent typo, its use tends to suggest that you might be thinking of the integral as being with respect to a probability measure on the real numbers, which would be a fundamental misunderstanding of what's going on here.

To fill in some of the missing context, $\ \mathcal{B}_1\ $ is the $\ \sigma$-field of Borel sets of $\ \mathbb{R}\ $, $\ X\ $ is a real-valued random variable on some probability space $\ (\Omega,\mathcal{F},P)\ $—that is, $\ X:\Omega\rightarrow\mathbb{R}\ $ is measurable with respect to $\ \mathcal{B}_1\ $ on $\ \mathbb{R}\ $ and $\ \mathcal{F}\ $ on $\ \Omega\ $—and $\ \mathcal{F}(X)\ $ is the $\ \sigma$-subfield of $\ \mathcal{F}\ $ defined by $\ \mathcal{F}(X)=\big\{X^{-1}(B)\,\big|\,B\in\mathcal{B}_1\,\big\}\ $.

A slightly different way of writing the above equation, which makes the applicability of the Radon–Nikodym theorem more immediately apparent, is $$ P(A\,\cap\,F)=\int_FP(A|X)\,dP, \ \ \ \text{for all }\ F\in\mathcal{F}(X)\ . $$ Since $\ P(F)=0\Rightarrow P(A\cap F)=0\ $, the measure $\ P_A\ $, defined on $\ \mathcal{F}(X)\ $ for every $\ A\in\mathcal{F}\ $ by $\ P_A(F)=P(A\,\cap\,F)\ $, is absolutely continuous with respect to $\ P\ $. The Radon-Nikodym theorem therefore tells us that:

  • $\ P_A\ $ has a Radon-Nikodym derivative $\ \frac{dP_A}{dP}\ $ with respect to $\ P\ $, which is measurable with respect to $\ \mathcal{F}(X)\ $ on $\ \Omega\ $ and $\ \mathcal{B}_1\ $ on $\ \mathbb{R}\ $, and satisfies the equation $$ P_A(F)=\int_F\frac{dP_A}{dP}\,dP\ $$ for all $\ F\in\mathcal{F}(X)\ $. The conditional probability $\ P(A|X)\ $ is thus just this Radon-Nikodym derivative.
  • If $\ \phi_1:\Omega\rightarrow\mathbb{R}\ $ and $\ \phi_2:\Omega\rightarrow\mathbb{R}\ $ are measurable with respect to $\ \mathcal{F}(X)\ $ on $\ \Omega\ $ and $\ \mathcal{B}_1\ $ on $\ \mathbb{R}\ $, and satisfy the equation $$ P_A(F)=\int_F\phi_i\,dP\ $$ for all $\ F\in\mathcal{F}(X)\ $, then $$\ P\big(\phi_1\ne\phi_2\big)=0\ .$$

Your question is how to show that the second bullet point above is true. The most succinct way of doing so is simply to invoke the Radon-Nikodym theorem, one half of which is just the assertion of that fact. Presuming you're looking for a more informative demonstration, though, here's a fairly typical way of proving it.

For each positive integer $\ n\ $, let $\ U_n=\left\{\phi_1-\phi_2>\frac{1}{n}\right\}\ $ and $\ V_n=\left\{\phi_2-\phi_1>\frac{1}{n}\right\}\ $. Then $\ U_n,V_n\in\mathcal{F}(X)\ $, $$ \big\{\phi_1\ne\phi_2\big\}=\bigcup_{n=1}^\infty U_n\cup\bigcup_{n=1}^\infty V_n\ , $$ and \begin{align} P_A\big(U_n\big)&=\int_{U_n}\phi_1\,dP=\int_{U_n}\phi_2\,dP\ ,\\ P_A\big(V_n\big)&=\int_{V_n}\phi_1\,dP=\int_{V_n}\phi_2\,dP\ . \end{align} From these equations, it follows that \begin{align} 0&=\int_{U_n}\big(\phi_1-\phi_2\big)\,dP\ge\frac{P\big(U_n\big)}{n}\ \ \text{ and}\\ 0&=\int_{V_n}\big(\phi_2-\phi_1\big)\,dP\ge\frac{P\big(V_n\big)}{n}\ , \end{align} and then, since $\ P\big(U_n\big),P\big(V_n\big)\ge0\ $, that $\ P\big(U_n\big)=P\big(V_n\big)=0\ $ for all $\ n\ $. Since a countable union of events of probability $\ 0\ $ also has probability zero, it follows that $$ P\big(\phi_1\ne\phi_2\big)=P\left(\bigcup_{n=1}^\infty U_n\cup\bigcup_{n=1}^\infty V_n\right)=0\ . $$