In the Machine Learning Book by Tom Mitchell https://www.cs.ubbcluj.ro/~gabis/ml/ml-books/McGrawHill%20-%20Machine%20Learning%20-Tom%20Mitchell.pdf, Equation (6.8) is given as (pages 168-169) $$ P(D \mid h)=\prod_{i=1}^{m} P(x_i,d_i \mid h) = \prod_{i=1}^{m} P(d_i \mid h,x_i) P(x_i) $$
What is the correct interpretation for the LHS and the RHS of the second equality? and how one can prove it?
For the LHS we have:
Interpretation 1:
$P(x_i,d_i \mid h)$ is the joint probability of $x_i$ and $d_i \mid h$, i.e. $P(x_i \; \textbf{and} \; (d_i \mid h))$. Simply meaning that we have a joint probability of two variables which the first one is independently given and the second one is a conditional probability.
Interpretation 2:
$P(x_i,d_i \mid h)$ is the joint probability of $x_i$ and $d_i$ both conditional on $h$, i.e. $P((x_i \; \textbf{and} \; d_i) \mid h))$.
The same question is for the RHS of the second equality, i.e. what is the correct way of interpreting $P(d_i \mid h,x_i)$? Considering it as $P(d_i \mid (h \; \textbf{and} \; x_i))$ or as $P((d_i \mid h)\; \textbf{and} \; x_i)$.
I would appreciate a detailed clarification of the above points as well as a proof of the correct interpretation.
Your “interpretation 1” is incorrect.
$p(x_i,d_i|h)$ denotes the joint probability of $x_i$ and $d_i$ when $h$ is given. It is defined as:
$p(x_i,d_i|h) = \frac{p(x_i,d_i,h)}{p(h)}$.
Therefore, $h$ conditions both $x_i$ and $d_i$.
For the second equality, consider the definition of $p(d_i|x_i,h)$:
$p(d_i|x_i,h) = \frac{p(x_i,d_i|h)}{p(x_i|h)}$
If you know/assume that $x_i$ and $h$ are independent (which the author of the book does), then you have $p(x_i|h)=p(x_i)$ and you get the second equality.
If you are interested in having a deeper understanding of this kind of models, then you should consider studying probabilistic graphical models and, in particular, Bayesian networks. There’s a great book by Daphne Koller and an excellent course in Coursera about that subject.