Deriving the Bayes Optimal Classifier (Mitchell, Machine Learning)

97 Views Asked by At

I am trying to recreate the Bayes Optimal Classifier result given in Machine Learning textbook by Mitchell. Below, I've added the desired result from the text and my work.

I think I've taken the right approach but the final equality has a difference in the conditional. Is my approach incorrect or is there an intuitive rationalization for why the conditionals are really the same?

DESIRED RESULT

Formula

MY DERIVATION

Derivation

1

There are 1 best solutions below

1
On BEST ANSWER

We have

$$P(v_j|D) = \sum_{h_i}P(v_j|h_i,D)P(h_i|D)$$ by the law of total probability.

I think the key to understand the equality iswe have $P(v_j|h_i, D)=P(v_j|h_i)$, that is given the hypothesis, that is the hypothesis determines the probability of $v$ taking the values of $v_j$ regardless of $D$.