I wake up in a random class and hear 6 biology-related words. How certain should I be that I'm in Biology class?

315 Views Asked by At

Suppose I'm sleeping in some class. I wake up and I hear 6 topic-specific words that seem related to biology. I'm asked to guess whether I'm in Biology class? How confident should I be? I think this can be presented with the following Bayesian network, with one parent node and 6 children nodes. enter image description here

Suppose that $$P(word_1|biology)=0.6$$$$P(word_2|biology)=0.6$$$$P(word_3|biology)=0.7$$$$P(word_4|biology)=0.7$$$$P(word_5|biology)=0.8$$$$P(word_6|biology)=0.8$$

Suppose that I think there's some chance I could hear these words in some other class, such as chemistry. Hence, let $P(word_i|\neg biology)$ be $P(word_i|biology)-0.1$:

$$P(word_1|\neg biology)=0.5$$$$P(word_2|\neg biology)=0.5$$$$P(word_3|\neg biology)=0.6$$$$P(word_4|biology)=0.6$$$$P(word_5|\neg biology)=0.7$$$$P(word_6|\neg biology)=0.7$$

My prior credence of being in biology class is $0.1$. How do I update to form a posterior after hearing these 6 words?


Upon hearing word 1, using Bayes rule I update as follows:

$$P(class=bio|word_1)=\frac{p(word_1|bio)*p(bio)}{p(word_1|bio)*p(bio)+p(word_1|\neg bio)*p(\neg bio)}=\frac{0.6*0.1}{(0.1*0.6)+(0.5*0.9)} \approx 0.1176$$

Do I keep updating like this sequentially for each word, plugging in the previous posterior as the next prior? Such as,

$$P(class=bio|word_2)=\frac{p(word_2|bio)*p(bio)}{p(word_2|bio)*p(bio)+p(word_2|\neg bio)*p(\neg bio)}=\frac{0.6*0.1176}{(0.1176*0.6)+(0.5*0.8824)} \approx 0.1378$$

And so on... Is that correct?

2

There are 2 best solutions below

1
On BEST ANSWER

Yes, your reasoning is correct... the posterior probability for each update becomes the prior probability for the next. (This is one of the nice things about the Bayesian approach.) Note that each update can be written as $$ P' = \frac{p_w P}{p_w P + q_w (1-P)}=\frac{p_w P}{q_w + (p_w - q_w)P}=\frac{P}{\alpha_w +(1-\alpha_w)P}, $$ where $\alpha_w=P(w|\neg bio) \div P(w|bio)$ is $5/6$ or $6/7$ or $7/8$ for your words. It's easy to check that the result after all six words comes out to $P\approx 0.221453$, and that this is independent of the order in which you do the updates.

In light of the other answer, it's worth noting that this is the same as the result from a single update with $\alpha=\prod_w \alpha_w=25/64$... that is, it's the same as treating the words as independent. This is exactly what the diagram says: the six words are independent, given the class. The advantage of the first approach, though, is that you can update your credence in an online fashion as you hear the words... allowing you to, say, take out your textbook as soon as you're sufficiently confident you're in the right class.

0
On

First if the events are dependent (e.g. word1 and word2 always appear together, in any class), then there is not much we can say without specifying the exact dependence.

So lets assume they are independent. I would model this as $W =$ the event of hearing all 6 words. So $P(W|bio) = 0.6^2 0.7^2 0.8^2 \approx 0.113$ and $P(W|\neg bio) = 0.5^2 0.6^2 0.7^2 \approx 0.044$, and

$$ P(bio|W)=\frac{P(W|bio)*P(bio)}{P(W|bio)*P(bio)+P(W|\neg bio)*P(\neg bio)} \\ \approx \frac{0.113*0.1}{(0.113*0.1)+(0.044*0.9)} \approx 0.22$$

However is this the same as your step-by-step process? I don't know the answer to that offhand... Perhaps someone else more familiar with Bayesian models can answer that? (And if the two answers are not equivalent, then which one is "correct"?)