Question about Bayesian probabilities. Where did I go wrong?

85 Views Asked by At

I was solving a question for an acquaintance, about probabilties. The original question goes like this:

Of all threats in an year, $12\%$ are Tier 1 and the remaining $88\%$ are Tier 2. If the probability that a reported Tier 1 threat is actually a Tier 2 is $21\%$ and the probability that a reported Tier 2 threat is actually a Tier 1 is $33\%$, then what is the probability that a reported Tier 1 is actually a Tier 1?


Here are my attempts to solve;

Let probability that an event is Tier 1$=P(T_1)=12\%$

Let probability that an event is Tier 2$=P(T_2)=88\%$

Let probability that a reported Tier 1 was actually Tier 2$=P(M_{1\to2})=21\%$

Let probability that a reported Tier 2 was actually Tier 1$=P(M_{2\to1})=33\%$

Let there be $10000$ events.

...

Then,

$\#T_1=1200$

$\#T_2=8800$

...

Let the number of reported Tier 1 events be $O$.

Let the number of reported Tier 2 events be $T$.

...

Then,

Number of actual Tier 1 Events from number of reported Tier 1 events $={{(100-21)}\over100}\times O=O_1$

Number of actual Tier 2 Events from number of reported Tier 1 events $={{21}\over100}\times O=T_1$

Number of actual Tier 1 Events from number of reported Tier 2 events $={{33}\over100}\times T=O_2$

Number of actual Tier 2 Events from number of reported Tier 2 events $={{(100-33)}\over100}\times T=T_2$

...

Since,

Number of Tier 1 threats $=1200$

$O_1+O_2=1200$

So $\left({{(100-21)}\over100}\times O\right)+\left({{33}\over100}\times T\right)=1200$

Since,

Number of Tier 2 threats $=8800$

$T_1+T_2=8800$

So $\left({{21}\over100}\times O\right)+\left({{(100-33)}\over100}\times T\right)=8800$

From here I get two equations in $O$ and $T$,

$\left({{79}\over100}\times O\right)+\left({{33}\over100}\times T\right)=1200$

and

$\left({{21}\over100}\times O\right)+\left({{67}\over100}\times T\right)=8800$

Wolfram|Alpha reports that $O$ is negative.

How is this possible? Where did I go wrong?


The actual question is near the middle here.

The answer should be explainable in a text-only environment.

3

There are 3 best solutions below

4
On

Using conditional probability expressions in a Bayesian probability question will lead to much less confusion about the topic.

Use $T_1,T_2$ as the mutually exclusive and exhaustive events that a threat is tier 1 or tier 2 respectively, and $R_1,R_2$ as the m.e.e. events that a threat is reported as such.

Of all threats in an year, $12\%$ are Tier 1 and the remaining $88\%$ are Tier 2. If the probability that a reported Tier 1 threat is actually a Tier 2 is $21\%$ and the probability that a reported Tier 2 threat is actually a Tier 1 is $33\%$, then what is the probability that a reported Tier 1 is actually a Tier 1?

So, you are told :

  • $\mathsf P(T_1)=0.12$ the probability that an event is actually tier 1
  • $\mathsf P(T_2)=0.88$ the probability that an event is actually tier 2
  • $\mathsf P(T_2\mid R_1)=0.21$ the probability that an event is tier 2 given that it is reported as tier 1.
  • $\mathsf P(T_1\mid R_2)=0.33$ the probability that an event is tier 1 given that it is reported as tier 2.

You seek $\mathsf P(T_1\mid R_1)$, the probability that a threat is tier 1 given that it is reported as tier 1.

$\mathsf P(T_1\mid R_1)=1-\mathsf P(T_2\mid R_1)$.

3
On

Assumptions:

  • All actual threats are reported threats.$\\[4pt]$
  • All reported threats are actual threats.

In other words, all actual (assessed) threats are first reported. Also, there are no false alarms. If a threat is reported at any tier level, then it is later assessed as an actual threat at some tier level.

Let the sample space be the set of pairs $(i,j)$ of reported threats, where $i \in \{1,2\}$ is the tier level reported, and $j \in \{1,2\}$ is the assessed (actual) tier level.

Let $p(i,j)$ be the probability of the event $\{(i,j)\}$.

Let $a=p(1,1),\;b=p(1,2),\;c=p(2,1),\;d=p(2,2)$.

Let $R_i$ be the event that tier level $i$ is reported.

Let $T_j$ be the event that tier level $j$ is assessed.

The goal is to find $P(T_1|R_1) = {\displaystyle{\frac{a}{a+b}}}$.

From the given information we get the following equations \begin{align*} P(T_1) &= a + c = \frac{12}{100}\\[4pt] P(T_2) &= b+ d = \frac{88}{100}\\[4pt] P(T_2|R_1) &= \frac{b}{a+b} = \frac{21}{100}\\[4pt] P(T_1|R_2) &= \frac{c}{c+d} = \frac{33}{100}\\[4pt] \end{align*} After clearing denominators on the last two equations, we have a system of $4$ linear equations in $4$ unknowns, which yields the solutions $$ a = -\frac{1659}{4600} ,\;\; b = -\frac{441}{4600} ,\;\; c = \frac{2211}{4600} ,\;\; d = \frac{4489}{4600} $$ If we temporily ignore the obvious issue (negative values of $a,b$), we get $$P(T1|R1) = \frac{a}{a+b} = \frac{79}{100}$$ but of course, we can't have $a,b < 0$, so unless I made a logical or algebraic error, the problem is wrong.

2
On

12% of population is Tier 1 and will be reported as such 79% of the time. So, True Tier 1 reports will be (0.12)(0.79) * population. 88% of population is Tier 2 but will be reported as Tier 1 33% of the time. So false Tier 1 reports will be (0.88)(0.33) * population. Probability of true Tier1 report is (true Tier 1 reports)/(all Tier 1 reports): (.12)(.79)/((.12)(.79)+(.88)(.33) = .246106

Probably more clear using the 10000 threats and reports example from the question:
0.12*10000=1200 actual Tier 1
0.79*1200=948 actual Tier 1s reported as 1s:TRUE Tier 1 reports
0.21*1200=252 actual 1s reported as 2s: FALSE Tier 2 reports
0.88*10000=8800 actual Tier 2
0.67*8800=5896 actual 2s reported as 2s: TRUE Tier 2 reports
0.33*8800=2904 actual 2s reported as 1s: FALSE Tier 1 reports

All Tier 1 reports = TRUE + FALSE Tier 1 reports: 948+2904=3852

Probability that reported Tier 1 is actual Tier 1
(TRUE 1 reports)/(All 1 reports) =948/3852=0.246