conditional probability, Bayesian, discrete

61 Views Asked by At

I'm trying to solve the following problem. A dataset is given.

\begin{array}{|c|c|c|c|} \hline Measurement Number & Time of the Day & Weather Description & Play \\ \hline 1 & morning & sunny & True \\ \hline 2 & morning & overcast & False \\ \hline 3 & afternoon & sunny & True \\ \hline 4 & morning & rainy & False \\ \hline 5 & evening & rainy & False \\ \hline 6 & afternoon & overcast & False \\ \hline 7 & morning & sunny & False \\ \hline 8 & evening & overcast & False \\ \hline 9 & afternoon & sunny & True \\ \hline 10 & evening & rainy & False \\ \hline 11 & evening & sunny & False \\ \hline 12 & morning & sunny & True \\ \hline 13 & morning & overcast & False \\ \hline 14 & afternoon & sunny & True \\ \hline 15 & morning & rainy & False \\ \hline 16 & afternoon & overcast & True \\ \hline 17 & afternoon & sunny & True \\ \hline 18 & afternoon & rainy & True \\ \hline 19 & afternoon & sunny & True \\ \hline 20 & evening & overcast & False \\ \hline 21 & afternoon & sunny & False \\ \hline \end{array}

I'm trying to estimate using Naive Bayesian: $P(\textrm{True} \mid <\textrm{afternoon}, \textrm{sunny}>)$. The Naive Bayesian approach relies on the assumption that the input variables (Time of the Day and Measurement Description) are independent.

What I did: $P(\textrm{True} \mid <\textrm{afternoon}, \textrm{sunny}>) = \frac{P(\textrm{True}) \cdot P(\textrm{afternoon} \mid \textrm{True} ) \cdot P(\textrm{sunny} \mid \textrm{True}) }{P(\textrm{afternoon}) \cdot P(\textrm{sunny})} = \\ \frac{9/21 \cdot 7/9 \cdot 7/9}{9/21 \cdot 10/21} \approx 1.27$

I think the correct answer should be 5/6.

Can someone help?

1

There are 1 best solutions below

5
On BEST ANSWER

I'm trying to estimate using Naive Bayesian: $P(\textrm{True} \mid <\textrm{afternoon}, \textrm{sunny}>)$. The Naive Bayesian approach relies on the assumption that the input variables (Time of the Day and Measurement Description) are independent.

No, it operates by considering the input variables to be conditionally independent given a category (in this case 'Play').

Now the "naive" conditional independence assumptions come into play: assume that each feature ${\displaystyle F_{i}}$ is conditionally independent of every other feature ${\displaystyle F_{j}}$ for ${\displaystyle j\neq i}$, given the category ${\displaystyle C}$.


What I did: $P(\textrm{True} \mid <\textrm{afternoon}, \textrm{sunny}>) = \frac{P(\textrm{True}) \cdot P(\textrm{afternoon} \mid \textrm{True} ) \cdot P(\textrm{sunny} \mid \textrm{True}) }{P(\textrm{afternoon}) \cdot P(\textrm{sunny})} = \\ \frac{9/21 \cdot 7/9 \cdot 7/9}{9/21 \cdot 10/21} \approx 1.27$

The appropriate formula for the Naïve Baysian is $$\def\P{\mathsf P}\P(\textrm{True} \mid \langle \textrm{afternoon}, \textrm{sunny}\rangle) = \tfrac{\P(\textrm{True}) \cdotp \P(\textrm{afternoon} \mid \textrm{True} ) \cdotp \P(\textrm{sunny} \mid \textrm{True}) }{\P(\textrm{True}) \cdotp \P(\textrm{afternoon} \mid \textrm{True} ) \cdotp \P(\textrm{sunny} \mid \textrm{True})+\P(\textrm{False}) \cdotp \P(\textrm{afternoon} \mid \textrm{False} ) \cdotp \P(\textrm{sunny} \mid \textrm{False})} $$

Though, given the table you can (and should) test the assumption of conditional independence before using it (heads up: it fails); or avoid making it and just directly use: $$\P(\textrm{True} \mid \langle \textrm{afternoon}, \textrm{sunny}\rangle) = \tfrac{\P(\textrm{True, afternoon, sunny}) }{\P(\textrm{True, afternoon, sunny})+\P(\textrm{False, afternoon, sunny})} = \tfrac {5/21}{(5+1)/21}$$