Correct interpretation of the Bayesian rule when having a mix of (i) a joint and (ii) a conditional probability

Question

Correct interpretation of the Bayesian rule when having a mix of (i) a joint and (ii) a conditional probability

86 Views Asked by Bumbble Comm At 11 May 2026 - 4:23

In the Machine Learning Book by Tom Mitchell https://www.cs.ubbcluj.ro/~gabis/ml/ml-books/McGrawHill%20-%20Machine%20Learning%20-Tom%20Mitchell.pdf, Equation (6.8) is given as (pages 168-169) $$ P(D \mid h)=\prod_{i=1}^{m} P(x_i,d_i \mid h) = \prod_{i=1}^{m} P(d_i \mid h,x_i) P(x_i) $$

What is the correct interpretation for the LHS and the RHS of the second equality? and how one can prove it?

For the LHS we have:

Interpretation 1:
$P(x_i,d_i \mid h)$ is the joint probability of $x_i$ and $d_i \mid h$, i.e. $P(x_i \; \textbf{and} \; (d_i \mid h))$. Simply meaning that we have a joint probability of two variables which the first one is independently given and the second one is a conditional probability.

Interpretation 2:
$P(x_i,d_i \mid h)$ is the joint probability of $x_i$ and $d_i$ both conditional on $h$, i.e. $P((x_i \; \textbf{and} \; d_i) \mid h))$.

The same question is for the RHS of the second equality, i.e. what is the correct way of interpreting $P(d_i \mid h,x_i)$? Considering it as $P(d_i \mid (h \; \textbf{and} \; x_i))$ or as $P((d_i \mid h)\; \textbf{and} \; x_i)$.

I would appreciate a detailed clarification of the above points as well as a proof of the correct interpretation.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2018-11-30 00:02:59

Your “interpretation 1” is incorrect.

$p(x_i,d_i|h)$ denotes the joint probability of $x_i$ and $d_i$ when $h$ is given. It is defined as:

$p(x_i,d_i|h) = \frac{p(x_i,d_i,h)}{p(h)}$.

Therefore, $h$ conditions both $x_i$ and $d_i$.

For the second equality, consider the definition of $p(d_i|x_i,h)$:

$p(d_i|x_i,h) = \frac{p(x_i,d_i|h)}{p(x_i|h)}$

If you know/assume that $x_i$ and $h$ are independent (which the author of the book does), then you have $p(x_i|h)=p(x_i)$ and you get the second equality.

If you are interested in having a deeper understanding of this kind of models, then you should consider studying probabilistic graphical models and, in particular, Bayesian networks. There’s a great book by Daphne Koller and an excellent course in Coursera about that subject.

Correct interpretation of the Bayesian rule when having a mix of (i) a joint and (ii) a conditional probability

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in BAYESIAN

Related Questions in CONDITIONAL-PROBABILITY

Trending Questions

Popular # Hahtags

Popular Questions