Reopened: Exercise 2.8 in Mackay's Information Theory, Inference and Learning Algorithms

84 Views Asked by At

This is from Exercise 2.8 in Mackay's Information Theory, Inference and Learning Algorithms where @tbjohnston gave his answer and the following predictive distribution:

$P(h \ |\ n_H, N) = \int df_H P(h | \ f_H) \ P(f_H | \ n_H, N)$

Intuitively I understand that the probability of getting a head on one toss is independent of the number of previous heads $n_h$ and the number of tosses $N$ but I want to know under what general circumstances $P(A|B) \times P(B|C, D) = P(A, B | C, D)$ as is suggested by the integral?

I am assuming it can be explained as follows using rules of independence and Bayes rule

$p(h, f_H | n_h, N) = \frac{p(h | f_H, n_H, N) p(f_H | n_H, N)}{p(n_H, N)} = \frac{p(h | f_H) p(f_H | n_H, N) p(n_H, N)}{p(n_H, N)} $

by independence of a future H and the previous number of heads and the number of throws. Additionally both the $p(n_H, N)$ in the numerator and denominator cancels. Is this correct?