Does anyone ever ascribe a value to $P(A|B)$ when $P(B)=0$? I realize that it being undefined makes sense, but I also feel like there should be a sensible definition.
What does conditional probability $P(A|B)$ mean when $P(B)=0$?
3.3k Views Asked by user82004 https://math.techqa.club/user/user82004/detail AtThere are 5 best solutions below
On
From Wikipedia:
If $P(B) = 0$, then the simple definition of $P(A|B)$ is undefined.
And with regard to your question:
I also feel like there should be a sensible definition.
Being undefined is a sensible definition, in fact, it's the only sensible definition! By analogy, consider $\frac 1 0$: it is undefined. Does that sound like a sensible definition to you?
On
Let $a,b$ be real-valued random variables on $\Omega \subset \mathbb{R}$ whose joint density $f$ is absolutely continuous with respect to the Lebesgue measure on $\mathbb{R}^2$. Then conditional probability has the definition:
$$ P(a \in A | b \in B) : = \frac{ \int_{A \times B} f(x,y) \, dx \, dy}{ \int_{\Omega \times B} f(x,y) \, dx \, dy} $$
If $B$ has Lebesgue measure zero, then intuitively, we would like to define this same quantity by taking some sequence of non-negligible measurable sets $B_n \downarrow B$ and using the limit $P(a \in A | b \in B_n)$. However, it should be easy to show that this limit is not independent of the choice of $B_n$ when $B$ contains more than one point. Intuitively, if there were two points, then any weighted sum of Dirac masses in the $b$ direction supported at those two points can be obtained by some limit. This shows that, in general, $P(A | B)$ is not well-defined for $P(B) = 0$.
However, if $B = \{b_0\}$ is a singleton, then this limit is defined (intuitively, at the limit, we have a Dirac mass in the $b$ direction), and we obtain
$$ P(a \in A | b = b_0) : = \frac{ \int_{A} f(x,b_0) \, dx}{ \int_{\Omega} f(x,b_0) \, dx} $$
Thus, in this restricted setting, the conditional probability is defined.
Here is a concrete example: Suppose both $a,b$ are i.i.d. uniform random variables on $[0,1]$. Suppose $A = [1/4, 3/4]$ and $B = \{1/3\}$. Then $P(a \in A | b \in B) = 1/2$. To see this, notice that the event $b \in B$ means that $a$ is allowed to take any values on $[0,1]$. Therefore the event $\{a \in A \cap b \in B\}$ means $a$ is allowed to take only values on $[1/4, 3/4]$, which is half the length of the full interval, so the conditional probability is $1/2$. Essentially, this example relies on the fact that sections of a larger sample space can "inherit" a measure in a well-defined way under certain nice conditions.
Henry's excellent example also falls under this same framework.
On
As an example of where this is meaningful, consider two independent standard normal random variables $X$ and $Y$ so each with mean $0$ and variance $1$.
Suppose $A$ is the event $X\le 1$ and $B$ is the event $X+Y=2$.
Then $P(A)=\Phi(1) \approx 0.8413$, $P(B)=0$ since this is a point in a continuous distribution, and $P(A|B)=0.5$ by symmetry.
On
Conditional probability could be seen as a useful tool to create new probabilistic spaces from a given one. It is a process equivalent to create new algebraic structure from others well know. For example, think of the euclidean vector space $(\mathbb{R}^3,+,\cdot,\mathbb{R})$, it represents the physical space, a new space can be constructed from it by considering only the first coordenates, obtaining $(\mathbb{R},+,\cdot, \mathbb{R})$ which represents an endless straight-line. In the case of the probabilistic "structure", given a probabilistic space, $(P,\Omega,\mathcal{A})$, the conditional operator $\lvert$ gives place to a new probabilistic space, whose probability function is defined by $P(A \lvert B) = \frac{P(A \cap B)}{P(B)}$ which means the probability the event A occurs once the event B has happened. That is the reason by which the conditional operator is also called the dynamical operator... What happens if P(B)=0? In such a case, you would obtain a trivial probabilistic space, since each even would be impossible, like the nil vectorial subspace $({0},+,\cdot,\mathbb{K})$ in Linear Algebra, but the Kolmogorov axiomatic definition of probability function imposes the condition of that the probability of the space is 1, that is the reason by which it is not studied the case $P(B) = 0$, because the natural definition doesn't give rise to a new probabilistic space.
Well, the actual probabilistic rule is $P(AB) = P(A|B)P(B) = P(B|A)P(A)$ and Bayes' Theorem is exactly that: a theorem derived from the product rule. So yes, $P(A|B)$ can have some value even if $P(B) = 0$, but it's not a very useful value since it will never be used. If $P(A)\neq 0$, then $P(B|A) = 0$ also necessarily and $P(A)$ is the only potentially useful number there (in other contexts, not with $B$); and if $P(A)=0$ then both $P(B|A)$ and $P(A|B)$ can have some value and it would be meaningless.