Sheldon Ross' A First Course in Probability - Section 3.3., Example 3f: Further questions on conditional probability.

310 Views Asked by At

Here's the statement of the problem.

At a certain stage of a criminal investigation, the inspector in charge is 60 percent convinced of the guilt of a certain suspect. Suppose, however, that a new piece of evidence which shows that the criminal has a certain characteristic (such as left-handedness, baldness, or brown hair) is uncovered. If 20 percent of the population possesses this characteristic, how certain of the guilt of the suspect should the inspector now be if it turns out that the suspect has the characteristic?

I read the solution and had the same question as the OP of this post. This answer clarifies the situation somewhat, however I have two other questions:

  1. Why is $P(C | G^c) = 0.2$? It makes sense that $P (C | G) = 1$, as the problem statement explicitly says so. But I don't see why it makes sense to apply an information on the population to the suspect, if he's not guilty.
  2. In the answer above, the author says that "$P(C)$ is the measure of the inspector's belief that the suspect would possess the characteristic prior to examination." How can we tell that just from reading the problem statement (especially the "prior to examination" bit)? The book's author defines $G$ as the event that the suspect is guilty and $C$ the event that he possesses the characteristic of the criminal, so I read $P(C)$ as "the probability that the suspect possesses the characteristic of the criminal", and thought that $P(C) = 1$.

(Perhaps this is a language barrier thing, but sometimes I have trouble interpreting the problem statement, or telling which detail is given, and which one is not. An example is the Monty Hall problem. When computing the probability that the host would point to door 2, it is necessary to eliminate the cases where he points to door 1, as the contestant has already picked it. However, when I considered the probability $P(\text{host points to door 2})$, I was thinking that I had to consider all possibilities, including the cases where the host points to door 1. The same sort of confusion arised when I had to consider whether the event "the suspect has the characteristic" is given (i.e., $P(C) = 1$) or not.)

3

There are 3 best solutions below

3
On BEST ANSWER

Why is $P(C|G^C)=0.2$? It makes sense that $P(C|G)=1$, as the problem statement explicitly says so. But I don't see why it makes sense to apply an information on the population to the suspect, if he's not guilty.

Think of the event $G$ as a singleton set consisting of one and only one person, that is, the person who committed the crime, among all people in the population. Then, $G^C$ is an event consisting of everyone in the population except for that one person who committed the crime. Hence, we interpret $P(C|G^C)$ as the probability our suspect has the characteristic in question given he is a member of $G^C$. In other words, $P(C|G^C)$ is the probability that our suspect has the characteristic if he is just one of many people in the population (minus the one guilty person). Since $20$% of the population has the characteristic and our suspect is presumed to be one the many, we have $P(C|G^C) = 0.2$.

In the answer above, the author says that "$P(C)$ is the measure of the inspector's belief that the suspect would possess the characteristic prior to examination." How can we tell that just from reading the problem statement (especially the "prior to examination" bit)? The book's author defines G as the event that the suspect is guilty and $C$ the event that he possesses the characteristic of the criminal, so I read $P(C)$ as "the probability that the suspect possesses the characteristic of the criminal", and thought that $P(C)=1$.

I think $P(C)$ is best interpreted as the probability the suspect has the characteristic in question. Nowhere in the problem description does it say the suspect actually possesses the characteristic. Thus, $P(C) \neq 1$ because we simply do not know for certain that the suspect has the characteristic. The likelihood with which he has the characteristic depends on whether he is guilty, and the law of total proability shows that

$$ P(C) = P(G)P(C|G) + P(G^C)P(C|G^C) $$

is actually a weighted average of conditional probabilities $P(C|G), P(C|G^C)$ where the weights $P(G), P(G^C)$ are the levels of belief regarding the suspect's guilt.

0
On
  1. Why is $P(C\mid G^c)=0.2$? It makes sense that $P(C\mid G)=1$, as the problem statement explicitly says so. But I don't see why it makes sense to apply an information on the population to the suspect, if he's not guilty.

$0.20$ is the proportion of people in the population with the characteristic. Since only one person may be guilty, and assuming the population is large, then the proportion of non-guilty with the characteristic is close enough to that. This may then be taken as the probability that the suspect will have the characteristic if actually not guilty.

Thus $\mathsf P(C\mid G^c) = 0.20$ is a reasonable approximation.

  1. How can we tell that just from reading the problem statement (especially the "prior to examination" bit)? .... so I read P(C) as "the probability that the suspect possesses the characteristic of the criminal", and thought that P(C)=1.

The suspect may not be the criminal, and the characteristic is new evidence, for which the suspect shall need to be examined.

The inspector was $60\%$ confident that the suspect is guilty, prior to this examination. Thus $\mathsf P(G)=0.60$ is the prior probability for guilt.

So, taking into account that the suspect will certainly have the characteristic if guilty, and have characteristic with a probability of $0.20$ if not-guilty, the inspector's confidence that the suspect will have the characteristic when examined, is found by the Law of Total Probability:

$$\begin{align}\mathsf P(C) &=\mathsf P(G)\mathsf P(C\mid G)+\mathsf P(G^c)\mathsf P(C\mid G^c)\\ &= 0.60\cdot 1.00+0.40\cdot0.20\\ &=0.68\end{align}$$

This is the prior probability for the suspect having the characteristic of the new evidence.

0
On

Bayes Law makes a lot more sense if you discretize it and put it in terms of odds. Initially, say there’s 100 scenarios, in the first 60 your guy did it. In the remaining 40 some random person did it.

The guy who did it is left handed, so amongst your first 60 scenarios your guy is alway left handed. In the next 40, the person is a random stranger, so only 20% of the scenarios have a left handed person, so 8 out of 40.

Originally, you only know there are 60 guilty scenarios among the possible 100, so 60/100. However, when you learn he’s left handed, you can eliminate all the right handed scenarios (imagine you’re playing Guess Who). Then, there are 60 guilty scenarios amongst a total of $60+8$ scenarios giving a posterior of $60/68=15/17$.

Once you build intuition for thinking like this, you can see your initial odds are 60:40 or 3:2 and learning about left-handedness is 5:1 times more likely for guilty people than innocent people (100:20 - this is called the likelihood ratio), so you can multiply the odds and the likelihood to get the new odds of 15:2. Then, every new piece of independent info you get just keeps multiplying the odds by its likelihood ratio.