- Though understanding these diagrams, I do not understand how to visualise the following explanation:
$\color{green}{[P1.]}$ Suppose you were to grab the edges of $A$ and stretch it out so it covers all of $\Omega$. $B$ stretches out with it. Now we ask, what proportion of $\Omega$ is covered by the intersection of $A$ and $B$? The answer is simply the proportion of $A$ covered by $B$. The proportional stretching of things doesn't change the proportion of $A$ covered by $B$ so this value is still $P(B|A)$.
$\color{green}{[P2.]}$ But, $A$ isn't actually the whole universe. We have to shrink everything back down so it's all in the original size relative to $\Omega$. To do this, we shrink everything back down by a factor of $P(A)$ (the true size of $A$ within $\Omega$.
- P1 and P2 appear tautological: why are A and B are stretched to cover all of $\Omega$ in P1, but then are shrunk back to the original sizes in P2? What is the objective of the stretch and shrink?
Now when we ask, what is the size of the the portion of $B$ that overlaps $A$ relative to the entire universe $\Omega$,
$\color{green}{[P3.]}$ the answer is $P(B|A)$ shrunk down by $P(A)$
which is $P(B|A)P(A)$.
- How can you visualise, and can anyone please depict, $\color{green}{[P3.]}$?
Draw a rectangle and label if $\Omega$. Draw a smaller rectangle inside it, labeled $A$. For best effect, draw $A$ so it is to scale. Likewise draw another rectangle inside $\Omega$ that overlaps $A$; label this $B$. Shade in the overlapping region and label it $A\cap B$.
Now for measures $\mathsf P(\,\cdot\,)$, such as $\mathsf P(A), \mathsf P(B), \mathsf P(A\cap B)$, we are comparing the probability measure of those regions to the entire space $\Omega$ (and thus $\mathsf P(\Omega)=1$).
For conditional measures, $\mathsf P(\,\cdot\mid A)$ we are comparing the probability measure of an area inside $A$ to the entire area of $A$. As we are restricting our attention to $A$, then $\mathsf P(B\mid A)$ is only concerned about the portion of $B$ that overlaps $A$. That is, the shaded region ($ A\cap B$).
The "stretch and shrink" is exactly this focus of attention. Because we are comparing measures to $A$, we "rescale" so that $\mathsf P(A\mid A)=1,~ \mathsf P(B\mid A)=\mathsf P(B\cap A)\big/\mathsf P(A)$, et cetera.