Reasons for defining $P(A|B)$ on $\Omega$ space instead of $B$

65 Views Asked by At

The way the conditioned probability measure $P(\cdot|B)$ is usually defined (let's consider only discrete space for simplicity) is by letting $$P(A|B):=\frac{P(A\cap B)}{P(B)}\quad\text{for every}\ A\subseteq \Omega.$$

Now obviously all $\omega \in B^c$ get sent to $0$, so this probability measure is nonzero only on $B$, so we could just as well defined it only on this space. Why is it nonetheless customarily still defined on the whole of $\Omega$, what are the advantages of doing so?

(As a side note, I heard someone once say that the reason for doing so is to be able to consider conditioning on multiple events, though I'm not sure what he meant by that. The standard definition of $P(\cdot |B)$ in any case disagrees with the usual definition of a "trace" object -e.g., think trace topology or trace $\sigma$-algebra- which is usually defined on its own restricted space, not on the original space.)

1

There are 1 best solutions below

2
On

Idea 1: We want to be able to talk about $P(A|B)$ when $A$ is not necessarily a subset of $B$. Restricting the definition of conditional probability to $B$ limits this in some regards (we would have to define $P(A|B)$ to be $P(A\cap B|B)$ in the $B$-definition, which is nonetheless equivalent to the $\Omega$-definition).

Idea 2: Simplicity. The definition over $\Omega$ is somehow simpler to understand and use than the definition over $B$ (perhaps for the reason stated in Idea 1). Definitions are often formed so that they are very convenient to use. Here you do not need to define another probability measure on $B$ (which would require renormalization, among other possible problems) when using the $\Omega$-definition, so in this way its simpler and more convenient to use.

------------------------ EDIT -----------------------------

Idea 3: Consistency with other areas of mathematics doesn't really matter. When you are learning about conditional probability, you are probably not also worrying about these other trace definitions, at least at an introductory level. In some sense the $\Omega$-definition of conditional probability is the traditional one, and so it will continue to be taught. For these two reasons, it is often simpler just to use the definition everybody "already" knows, adding to the thought in Idea 2. Someone who is looking for consistency between these trace-like definitions (i.e. the OP) can make them, but that is usually something that is quickly derived/noted from the definition.

As for examples of Idea 2, see any well written math text. For example, definitions that yield simple proofs are especially common in foundational areas like set theory, where many different definitions get the job done, but some do it more quickly than others.