This is a following question after Is the definition of conditional probability misleading.
After defining the conditional probability this way, Rice immediately introduced the multiplication law:

But isn't this law totally circular (if we don't prove extra things)? If we want to know P(A ∩ B) using this law, we must know P(A | B), but P(A | B) can only be computed using P(A ∩ B) if we follow the definition instead of interpretation.
Here is an example Rice uses that could illustrate my point:

The method he compute P(A|B) is: "if a red ball has been removed on the first trial, there are two red balls and one blue ball left. Therefore, P(R2 | R1) = 2/3."
But he is using the interpretation of conditional probability instead of the actual definition. I see nowhere in the definition that says you could compute P(A | B) by assuming B is true while all other background information stay the same. Now here is a subtle thing, in almost all his elementary examples, the P is actually unique. So following the axioms you get the same P. In this example, adding "R1 is true" into the background information, you could derive an unique P that happens to be equal to the P(..|B) using the definition of conditional probability.
But in general in order to let this to happen, you must prove that the two P are the same (one by incorporating B is true into background knowledge and one by using the definition of conditional probability).
Is my thinking in here correct? If so, how to state the interpretation clearly and how to prove this interpretation using the definition of conditional probability?
Ok, after looking wikipedia page of Conditional probability, I find a answer that fits my need.
Quote from it's formal definition page:"
Formally, P(A|B) is defined as the probability of A according to a new probability function on the sample space, such that outcomes not in B have probability 0 and that it is consistent with all original probability measures.[7][8]
Let Ω be a sample space with elementary events {ω}. Suppose we are told the event B ⊆ Ω has occurred. A new probability distribution (denoted by the conditional notation) is to be assigned on {ω} to reflect this. For events in B, it is reasonable to assume that the relative magnitudes of the probabilities will be preserved. For some constant scale factor α, the new distribution will therefore satisfy:
" (end quote)
Now if we go through the reverse order. First let the conclusion as the definition, then we could also get the three axioms easily.