I am trying to understand the mathematical basis of combining independent probabilities, as described here:
http://www.paulgraham.com/naivebayes.html
Suppose that being over 7 feet tall indicates with 60% probability that someone is a basketball player, and carrying a basketball indicates this with 72% probability. If you see someone who is over 7 feet tall and carrying a basketball, what is the probability that they're a basketball player?
with the formula given by:
(a)(b)......N/(((a)(b)......N) + (1-a)(1-b).....(1-N))
(my actual problem is different, but very similar to the case of assigning an "aggregated" spam indicator to a mail based on the combination of individual, independent indicators).
I've taken a look at the link given in that article
http://www.mathpages.com/home/kmath267.htm
but cannot follow the jump from Y/N predictions to the assumptions of symmetrical probability. Can someone point me to a discussion/proof that would help?
[Apologies if this is off-topic, too vague or if I am asking a duplicate question: I have looked but not found anything covering this topic, although this came closest: Probability from a collection of independent predictions.
By the very fact that tallness and the possession of a basketball are both (fairly) good indicaters of a basketball player, the assumption that they are independant is undermined. Consider it this way: You are asked to tell whether a person you do not see is over 7 feet tall. You can only answer "How should I know?" Then you are told that the person is carrying a basketball. Now you can infer that he probably is a basketball player and basketball players are typically tall. While you still cannot know for sure, this piece of information should have made you think it is now more likely that the person is tall. This is becvause tallness and ball posession are correlated (by the subpopulation of basketball players where both properties are common).
Here are a few extreme cases:
In your town there are 300 people, namely: 180 basketball players with a ball; 36 non-players with a ball; 84 non-players without a ball. And everybody is over 7 feet tall. This matches the problem describtion and the probability of a tall person with ball being a player is still 72%, because tallness tells you nothing
In your town are 29 people, namely: One tall non-player with a ball; one tall non-player without a ball; six small non-players with a ball; 18 small players with a ball; and 3 tall players without a ball. This matches the problem description and the probability of a tall person with ball being a player is $0$.
In your town are 37 people, namely: 7 small non-players with ball, 12 tall non-players without ball; and 18 basketball players, all tall and all having a ball. Now the probability is $1$