Lets assume I have a supermarket and I track the purchase history of my customers with 2 attributes of each customer - Gender (M/F) & Smiling (Y/N).
Assume this is historical data of purchases:
| Total | Male | Smile
Beer | 25 | 20 | 22
Total articles sold including beer: 40
Total number of customers: 60
I need to find the this conditional probability:
Probability that a given person who is male and is smiling will buy a beer.
I need ---> P(Beer/Male and Smiling)
Bayes Theroem:
P(A|B) = P(B|A) * P(A) / P(B)
Applying this to my problem:
P(B|M,S) = P(M,S|B) * P(B) / P(M,S)
= {P(M|B) * P(S|B)} * P(B) / {P(M) * P(S)}
= (20/25 * 22/25) * 25/40 / (20/60) * (22/60)
= 0.70 * 0.625 / 0.122
= 3.58
I am clearly doing something very, very wrong. Need some guidance.
Assumptions I have made:
- Probability of Male and Smiling is independent - I guess this is where the issue lies
- P(M,S|B) - This component's calculation & formula - are they right?
- Is it an issue with the data?
Edit in response to Guillame's answer:
Let me define those for you:
Assuming that Male & Smiling = 15
- Smiling Males bought beer: 10
- Smiling Male did not buy beer: 5
Now given this information, where exactly am I going wrong?
Your current data is insufficient to answer, and this is why you run into errors.
You have a discrete distribution (Beer, noBeer) | (Male, Female) | (Smile, noSmile), so we can think of the joint probability distribution as being a 2*2*2 array (or a Tensor in math-speak)
Right now, the data you are giving specifies only the joint distribution of Beer and sex, and of beer and smile. You need the joint distribution of everything in order to be able to apply bayes formula
So, what you need to look at is:
how many smiling males bought beer
how many smiling males didn't buy beer
and normalize that quantity, and you will have your answer