I'm trying to solve the following problem. A dataset is given.
\begin{array}{|c|c|c|c|} \hline Measurement Number & Time of the Day & Weather Description & Play \\ \hline 1 & morning & sunny & True \\ \hline 2 & morning & overcast & False \\ \hline 3 & afternoon & sunny & True \\ \hline 4 & morning & rainy & False \\ \hline 5 & evening & rainy & False \\ \hline 6 & afternoon & overcast & False \\ \hline 7 & morning & sunny & False \\ \hline 8 & evening & overcast & False \\ \hline 9 & afternoon & sunny & True \\ \hline 10 & evening & rainy & False \\ \hline 11 & evening & sunny & False \\ \hline 12 & morning & sunny & True \\ \hline 13 & morning & overcast & False \\ \hline 14 & afternoon & sunny & True \\ \hline 15 & morning & rainy & False \\ \hline 16 & afternoon & overcast & True \\ \hline 17 & afternoon & sunny & True \\ \hline 18 & afternoon & rainy & True \\ \hline 19 & afternoon & sunny & True \\ \hline 20 & evening & overcast & False \\ \hline 21 & afternoon & sunny & False \\ \hline \end{array}
I'm trying to estimate using Naive Bayesian: $P(\textrm{True} \mid <\textrm{afternoon}, \textrm{sunny}>)$. The Naive Bayesian approach relies on the assumption that the input variables (Time of the Day and Measurement Description) are independent.
What I did: $P(\textrm{True} \mid <\textrm{afternoon}, \textrm{sunny}>) = \frac{P(\textrm{True}) \cdot P(\textrm{afternoon} \mid \textrm{True} ) \cdot P(\textrm{sunny} \mid \textrm{True}) }{P(\textrm{afternoon}) \cdot P(\textrm{sunny})} = \\ \frac{9/21 \cdot 7/9 \cdot 7/9}{9/21 \cdot 10/21} \approx 1.27$
I think the correct answer should be 5/6.
Can someone help?
No, it operates by considering the input variables to be conditionally independent given a category (in this case 'Play').
The appropriate formula for the Naïve Baysian is $$\def\P{\mathsf P}\P(\textrm{True} \mid \langle \textrm{afternoon}, \textrm{sunny}\rangle) = \tfrac{\P(\textrm{True}) \cdotp \P(\textrm{afternoon} \mid \textrm{True} ) \cdotp \P(\textrm{sunny} \mid \textrm{True}) }{\P(\textrm{True}) \cdotp \P(\textrm{afternoon} \mid \textrm{True} ) \cdotp \P(\textrm{sunny} \mid \textrm{True})+\P(\textrm{False}) \cdotp \P(\textrm{afternoon} \mid \textrm{False} ) \cdotp \P(\textrm{sunny} \mid \textrm{False})} $$
Though, given the table you can (and should) test the assumption of conditional independence before using it (heads up: it fails); or avoid making it and just directly use: $$\P(\textrm{True} \mid \langle \textrm{afternoon}, \textrm{sunny}\rangle) = \tfrac{\P(\textrm{True, afternoon, sunny}) }{\P(\textrm{True, afternoon, sunny})+\P(\textrm{False, afternoon, sunny})} = \tfrac {5/21}{(5+1)/21}$$