I've asked this question on Udacity, no response.
It goes like this. There's a network with A at the root, and three X variables dependent on A. The probability of each given A is equal, and they are conditionally independent. The problem is to get $P(X3|X1)$, given $P(A), P(Xi|A)$, and $P(Xi|!A)$.
I understand why the solution uses the total probability, but what I don't get is why it uses $P(A|Xi)$ instead of $P(Xi|A)$. It feels like a leap, because I haven't learned whatever logic is behind that.
Any links to helpful (comprehensive and easy to read) resources is super appreciated.
There are a variety of sources on Bayes' theorem, which I will not delve into - a quick internet search will result in the Wikipedia page, which is useful.
However, I will try to explain Bayes' theorem in a nutshell, as you had not encountered it before according to your posting on Udacity, but are more familiar with it after watching a video.
For two events $A$ and $B$, Bayes' Theorem can be summarised by the following equation:-
$\large P(A|B)=\frac{P(A,B)}{P(B)}$ (Eq. $1$)
The term $P(A|B)$ is the conditional probability that event $A$ will occur given that event $B$ has occurred, and is equal to the probability that both event $A$ and $B$ occur ($P(A,B)$) divided by the probability that event $B$ occurs, $P(B)$.
A concrete example would be $A$ being the event of a serious injury in a car accident, and $B$ being the event that a seat belt was worn - one would expect $P(A|B)$ to be much lower than $P(A|B^c)$, which is the conditional probability of injury in a car accident given that a seat belt was not worn.
We can rearrange (Eq. $1$), so that
$P(A,B)=P(A|B)P(B)$ (Eq. $2$)
Note that if event $A$ and $B$ are independent, $P(A,B)=P(A)P(B)$, resulting in $P(A|B)=P(A)$. This means that the probability of $A$ occurring given $B$ has occurred is simply the probability of $A$ occurring - as $B$ is independent of $A$, it will have no bearing on the probability of $A$.
Equation $2$ can be regarded as the simplest example of the Chain Rule
Examining the denominator of (Eq. $1$),
$P(B)=P(B,A)+P(B,A^c)=P(B|A)P(A)+P(B|A^c)P(A^c)$
where $A^c$ is the complement to event $A$, so that $P(A\cup A^c)=1$.
We can thus rewrite (Eq. $1$) as follows:-
$\large P(A|B)=\frac{P(A,B)}{P(B|A)P(A)+P(B|A^c)P(A^c)}$ (Eq. $3$)
As $P(A,B)=P(B,A)$ we have
$ P(A|B)P(B)=P(B|A)P(A)$
So we can express $P(B|A)$ in terms of $P(A|B)$ as follows
$\large P(B|A)=\frac{P(A|B)P(B)}{P(A)}$
You can extend the conditional probability for conditioning on multiple events, as follows:-
$\large P(A|B,C)=\frac{P(A,B,C)}{P(B,C)}$
Based on conditioning for multiple events, we can extend the Chain Rule in (Eq. $2$), for multiple events $A,B,C,D$ as follows
$P(A,B,C,D)=P(A|B,C,D)P(B|C,D)P(C|D)P(D)$
As $P(A,B,C,D)=P(D,C,B,A)$, an equally valid expression of the Chain Rule would be
$P(A,B,C,D)=P(D|C,B,A)P(C|B,A)P(B|A)P(A)$
Now, let us go back to the question you posed on Udacity (and I will adhere to your notation for the sake of consistency). Using Bayes Rule, we have
$\large P(X3|X1)=\frac{P(X3,X1)}{P(X1)}=\frac{P(X3,X1,A)+P(X3,X1,!A)}{P(X1)}$ (Eq. $4$)
The denominator is $P(X1)$, as we have conditioned on this event for the conditional probability. Thus we will expect a probability conditioned on the $X1$ term, i.e. $P(\cdot|X1)$, rather than $P(\cdot|A)$, which is conditioned on $A$.
Using the Chain rule for the first term in the numerator of (Eq. $4$), we have
$P(X3,X1,A)=P(X3|A,X1)P(A|X1)P(X1)$
and for the second term in the numerator, we have
$P(X3,X1,!A)=P(X3|!A,X1)P(!A|X1)P(X1)$
This will result in (Eq. $4$) simplifying to the following (note that the $P(X1)$ terms cancel out)
$P(X3|X1)=P(X3|A,X1)P(A|X1) + P(X3|!A,X1)P(!A|X1)$