Confusion of the inverse problem (probability)

93 Views Asked by At

The following has been copy pasted from Wikipedia from the Confusion of the inverse page, where the following are examples of common fallacies. Help me understand why the second one is wrong.

° Hard drug users tend to use marijuana; therefore, marijuana users tend to use hard drugs (the first probability is marijuana use given hard drug use, the second is hard drug use given marijuana use).[5]

° Most accidents occur within 25 miles from home; therefore, you are safest when you are far from home.[5]

° Terrorists tend to have an engineering background; so, engineers have a tendency towards terrorism.[6]

Edit 1: Thank you for the answer but I still fail to visualize it mathematically. Can you help me visualize it like I have on the marijuana/hard drug example? (I'm fairly new to conditional probability and Bayes theorem so it might be wrong)

couldn't upload the pic for some reason so here's the link

2

There are 2 best solutions below

0
On

The second one is wrong because most activity happens within 25 miles from home. It's not being close to home that causes the spike in accident rates; it's the fact that you're close to home most of the time.

Here's a related fallacy: all human deaths occur on the Earth's surface. Therefore, the safest place for a human is outer space.


In response to your edit: The parallelism is the same as the marijuana one you cite. Specifically, $\mathbb P(\text{close to home} \mid \text{accident}) > \mathbb P(\text{far from home} \mid \text{accident})$, but it would be wrong to say $\mathbb P(\text{accident} \mid \text{close to home}) > \mathbb P(\text{no accident} \mid \text{close to home})$.

Second attempt at clarification: for the sake of discussion, let's imagine that the events "have an accident" ($A$) and "being close to home" ($H$) are independent of one another, but that $\mathbb P(A) = 0.1$ and $\mathbb P(H) = 0.9$. The claim "most accidents occur close to home" is justified, because $$\mathbb P(A \cap H) = 0.09 > 0.01 = \mathbb P(A \cap H^c).$$ Really, this is just a reflection of the fact that $\mathbb P(H) > \mathbb P(H^c)$, since the events are independent. However, it would not be fair to say that you are safest when far from home, because $\mathbb P(A | H) = \mathbb P(A | H^c) = 0.1$ by independence.

0
On

The Wikipedia article doesn’t do a very good job of explaining the fallacy. It doesn’t mention the grain of truth in the fallacy, and it’s hard to understand why a fallacy is a fallacy if you don’t understand what makes it attractive.

Statistical (in)dependence is indeed in a rigorous sense a symmetrical relationship, both qualitatively and quantitatively: Event $A$ is (in)dependent of event $B$ exactly if event $B$ is (in)dependent of event $A$. But also quantitatively, if you learn that $A$ has happened, that makes it more or less likely that $B$ has happened by the same factor as vice versa; in probabilities:

$$ \frac{P(A\mid B)}{P(A)}=\frac{P(B\mid A)}{P(B)}\;. $$

That is, if you learn that someone was away from home, that makes it more likely by the same factor that they had an accident as it makes it more likely that they were away from home if you learn that they had an accident; and it makes it more likely by the same factor that someone is an engineer when you learn that they’re a terrorist as it makes it more likely that they’re a terrorist when you learn that they’re an engineer.

It’s a small step from this true fact to the fallacy of thinking that “most accidents occur within $25$ miles from home; therefore, you are safest when you are far from home”. In fact, the step is so small that the Wikipedia article doesn’t really differentiate clearly in its formulations between the fallacy and the true statement: “Hard drug users tend to use marijuana; therefore, marijuana users tend to use hard drugs” could reasonably be interpreted as a true statement in the sense explained above.

In the fallacy, this true statement is taken to mean other things that are not in fact implied. The article focuses on the misunderstanding of thinking that, instead of the ratios $\frac{P(A\mid B)}{P(A)}$ and $\frac{P(B\mid A)}{P(B)}$ being equal, the probabilities $P(A\mid B)$ and $P(B\mid A)$ themselves are equal or similar. For instance, in the example of the medical test, it’s true that if being infected makes it ten times more likely that you test positive, then testing positive also makes it ten times more likely to be infected – but that doesn’t mean that it makes it likely to be infected. If the overall rate of infection is very low, you could have a test with very low false positive rate that makes it $100$ times more likely to be infected when you test positive, and still it could be extremely unlikely that you’re infected if you test positive.

Let’s go through the examples in light of all of the above (the true statement $\frac{P(A\mid B)}{P(A)}=\frac{P(B\mid A)}{P(B)}$ and the misunderstanding $P(A\mid B)\approx P(B\mid A)$). In my view, only one of them (the second one, which you asked about) is a clear example of the above misunderstanding; in the other two, especially the third one, the problem seems to be at least partly an imprecise use of language.

Hard drug users tend to use marijuana; therefore, marijuana users tend to use hard drugs

It depends what you mean by “tend to”. If you mean “are more likely than other people to”, then the sentence is true (assuming the premise in the first half of the sentence is true; I haven’t checked that). In that case it just says

$$P(\text{marijuana}\mid\text{hard drugs})\gt P(\text{marijuana})\implies P(\text{hard drugs}\mid\text{marijuana})\gt P(\text{hard drugs})\;,$$

which is correct. But the sentence is false if you mean “marijuana users are likely to use hard drugs” and thus interpret it as something like

$$P(\text{marijuana}\mid\text{hard drugs})\gt\frac12\implies P(\text{hard drugs}\mid\text{marijuana})\gt\frac12\;.$$

Most accidents occur within 25 miles from home; therefore, you are safest when you are far from home.

Here the language of the premise is quite clear. “Most accidents occur within $25$ miles from home” is a statement about the proportion of accidents that occur close to home; it’s not a comparison of a conditional probability with an unconditional probability. It says

$$P(\text{close to home}\mid\text{accident})\gt\frac12\;,$$

it doesn’t say

$$P(\text{close to home}\mid\text{accident})\gt P(\text{close to home})\;,\tag1\label1$$

i.e. it doesn’t say that you’re more likely to be close to home than usual when you have an accident; you may just always be likely to be close to home. But the conclusion that it tries to draw is a comparison of a conditional probability with an unconditional probability (or with the complementary conditional probabilities, which amounts to the same thing); the conclusion is

$$P(\text{accident}\mid\text{far from home})\lt P(\text{accident}\mid\text{close to home})\;,$$

which is equivalent to

$$P(\text{accident}\mid\text{close to home})\gt P(\text{accident})\;,$$

which is not a valid conclusion to draw from the premise. But it would be a valid conclusion if the premise were $\eqref1$.

Terrorists tend to have an engineering background; so, engineers have a tendency towards terrorism.

This is more or less like the drug use example (except the conclusion says “have a tendency towards” instead of “tend to”). Here, again, if we take this to mean

$$P(\text{engineer}\mid\text{terrorist})\gt P(\text{engineer})\implies P(\text{terrorist}\mid\text{engineer})\gt P(\text{terrorist})\;,$$

it’s correct, but if we take it to mean

$$P(\text{engineer}\mid\text{terrorist})\gt\frac12\implies P(\text{terrorist}\mid\text{engineer})\gt\frac12\;,$$

(or perhaps $\not\ll1$ instead of $\gt\frac12$), then it’s clearly false – probably already the premise is false, but certainly the conclusion is false and anyway the inference is invalid.

In this case, the imprecision in the language that allows it to be interpreted as something in between a correct and a false statement seems key to me. I doubt that anyone would give any credence to the statement if it were formulated in precise language to exhibit the confusion of inversion:

A large proportion of terrorists are engineers; therefore a large proportion of engineers are terrorists.

If you put it in these plain terms, it doesn’t require any training in statistics to recognize it as a fallacy. By contrast, in the example with accidents close to home, the fallacy is spelled out in rather clear terms, and yet it isn’t that easy to spot if you’re not familiar with how probabilities work.