I already read this, and so wish to intuit 3 without relying on (only rearranging) the definition of Conditional Probability.
I modified the following's source for concision.
$1.$ Now look at $\Pr(A \cap B)$. We know that if $A$ has happened, then $A \cap B$ happens with probability $\Pr(B\mid A)$.
$2.$ If we do NOT know that $A$ has happened, we must $\color{darkred}{SCALE \; \Pr(B\mid A) \text{ with } \Pr(A)}$.
$3.$ Thus, $ \Pr(A \cap B)= \Pr(B\mid A)\Pr(A) \text{.}$
I pursue only intuition; please do not answer with formal proofs.
I do not understand 2. How does $\color{darkred}{SCALING \; \Pr(B\mid A) \text{ with } \Pr(A)}$ translate into multiplying them both together? For example, why does 'scaling' not imply addition?
Suppose you were to grab the edges of $A$ and stretch it out so it covers all of $\Omega$. $B$ stretches out with it. Now we ask, what proportion of $\Omega$ is covered by the intersection of $A$ and $B$? The answer is simply the proportion of $A$ covered by $B$. The proportional stretching of things doesn't change the proportion of $A$ covered by $B$ so this value is still $P(B|A)$.
But, $A$ isn't actually the whole universe. We have to shrink everything back down so it's all in the original size relative to $\Omega$. To do this, we shrink everything back down by a factor of $P(A)$ (the true size of $A$ within $\Omega$.
Now when we ask, what is the size of the the portion of $B$ that overlaps $A$ relative to the entire universe $\Omega$, the answer is $P(B|A)$ shrunk down by $P(A)$ which is $P(B|A)P(A)$