Heuristic interpretation of measure theoretic conditional probability and information

40 Views Asked by At

Billingsley provides a heuristic approach to the measure theoretic definition of a conditional probability $P(A || \mathcal{G})$ in terms of partial information. We are to look at knowing $\mathcal{G}$ as an experiment where we know for each $G \in \mathcal{G}$ and each $\omega \in \Omega$ the output of of $\mathbb{1}_G(\omega)$. However, Billingsley then notes that this heuristic breaks down in certain pathological cases. Let $(\Omega, \mathcal{F}, P)$ be the unit interval $\Omega$ with Lebesgue measure $P$ on the Borel $\sigma$-algebra $\mathcal{F}$. Take $\mathcal{G}$ to be the sub-$\sigma$-algebra of countable or cocountable subsets. The heuristic approach says that, since $\mathcal{G}$ contains every singleton set, we know exactly which $\omega$ we pick when computing $P(A || \mathcal{G})(\omega)$, and so we know whether $\omega \in A$ or $\omega \notin A$, and so we should have $P(A || \mathcal{G}) = \mathbb{1}_A(\omega)$. However, this is incorrect since $\mathbb{1}_A$ is not necessarily $\mathcal{G}$-measurable, and the correct answer is $P(A || \mathcal{G}) (\omega) = P(A)$, $P$-a.s.

One possible idea that I had is that the "information" encoded in $\mathcal{G}$ should have some connection to the background measure $P$. In this case, knowing $\omega$ is not valuable information with regards to $P$. However, if $P$ is a mass at, say, $\omega_0 = \frac{1}{2}$, then to know an $\omega$ is potentially valuable with respect to $P$. We have two cases, either $\omega_0 \in A$ or $\omega_0 \notin A$:

  1. If $\omega_0 \in A$, then $P(A || \mathcal{G})(\omega) = \mathbf{1}_{\omega_0}(\omega)$, $P$-a.e.
  2. If $\omega_0 \notin A$, then $P(A || \mathcal{G})(\omega) = 0$, $P$-a.e.

In both cases, we have $P(A || \mathcal{G})(\omega) = P(A) \mathbf{1}_{\omega_0}(\omega),$ which agrees with the (updated) heuristic that knowing $\mathcal{G}$ lets you update your probabilities with respect to the $P$-valuable data in $\mathcal{G}$.

My question is 1) whether this makes sense; 2) how to repair the intuition so that it holds in all cases, and, 3) failing that, an understanding of when we should expect the heuristic Billingsley gives to fail.