From Wikipedia
$$P(A)=\sum_n P(A\cap B_n)\tag{1}$$
$$P(A)=\sum_n P(A\mid B_n)P(B_n)\tag{2}$$
I understand completely how $(2)$ follows from $(1)$ since really this is the definition of conditional probabilities, $$P(A\cap B_n)=P(A\mid B_n)P(B_n),$$ Now since, $$P(A_m\cap B_n)=P(B_n\cap A_m)$$ I can rewrite $(2)$ as $$P(B)=\sum_m P(B\mid A_m)P(A_m)\tag{3}$$ where $n, m \in \Bbb{N}$, so the probability for event $B$ to occur, $P(B)$, can be found by looking at this tree diagram for conditional probabilities from Wikipedia,
For the case that $m=2$, using equation $(3)$, $$P(B)=P(B\mid A)P(A)+P(B\mid \bar{A})P(\bar{A})\tag{4}$$
Now here is the problem. Wikipedia then goes on to the analogous formulas for conditional LHS, but fails to mention the origin of the formulas below,
$$P(A \mid C) = \sum_n P(A \mid C \cap B_n) P(B_n \mid C)\tag{5}$$
$$P(A \mid C) = \sum_n P(A \mid C \cap B_n)\tag{6}$$
In a course on foundations of quantum mechanics, I have seen how these formulae ($(5)$ and $(6)$) on Wikipedia were derived (suppressing unnecessary detail):
but I still cannot understand how formula $(1.18)$ was obtained from $(1.17)$.
I have a problem understanding why it is that $P(A \mid C)$ is given by summing over $(1.17)$: $$\sum_jP(A_i \cap B_j\mid C_k ) = \sum_j P(A_i \mid B_j \cap C_k) P(B_j \mid C_k)\tag{7}$$ In order to try to understand this, I put indices on all the events and will assume for simplicity that $i,j,k\in \{1,2\}$, I now construct the tree diagram, which is,
Lets say I wanted to compute $P(A_2)$, then, following the same logic as for the simpler case earlier (equation $(4)$ above), this is given by $$P(A_2)\stackrel{?}{=}P(B_1\mid C_2)P(A_2\mid B_1)+P(B_2\mid C_2)P(A_2\mid B_2)\tag{?}$$ Now, of course I know that the LHS of $(?)$ is actually $P(A_2\mid C_2)$. What do these sums over probability of intersections in $(7)$, $$\sum_jP(\color{red}{A_i \cap B_j}\mid C_k )$$ correspond to in a tree diagram? Specifically, it is the part that is marked red that is causing all the difficulty now, I don't know how to interpret this on the tree diagram above.
Remark: I'm sorry if what I'm asking here is unclear, I am still working on a way to word it better. I have also read this related question but the proof in one of the answers does not address the conditional probabilities case (which is what I am questioning).




Regarding 1.18: this is actually equation (1) in disguise.
$$P(A \mid C) P(C) = P(A \cap C) = \sum_j P((A \cap C) \cap B_j) = \sum_j P(A \cap B_j \cap C) = \sum_j P(A \cap B_j \mid C) P(C).$$ Dividing by $P(C)$ yields the first equality in (1.18).
Your equation (?) is not correct. I really am not sure what you are going for there. If you are really computing $P(A_2)$, then you can't neglect $C_1$. Some ways to correct this equation are $$P(A_2 \cap C_2) = P(A_2 \cap C_2 \cap B_1) +P(A_2 \cap C_2 \cap B_2)$$ by looking at the $C_2$ tree. If you want to write these in terms of conditional probabilities, you can write \begin{align} P(A_2 \cap C_2) &= P(A_2 \mid C_2) P(C_2) \\ P(A_2 \cap C_2 \cap B_1) &= P(A_2 \mid B_1 \cap C_2)P(B_1 \mid C_2) P(C_2) \\ P(A_2 \cap C_2 \cap B_2) &= P(A_2 \mid B_2 \cap C_2)P(B_2 \mid C_2) P(C_2) \end{align}
Personally I don't think the trees will be too helpful in understanding conditional probabilities. (They can be helpful when considering intersections of events, since each node of the tree corresponds to the intersection of events along the path from the root of the tree to the node.) I find it simplest to reduce conditional probabilities down from the definition, e.g. $P(A \mid B) = P(A \cap B) / P(B)$; everything should follow pretty simply if you apply this definition.