Conditional probability based on criteria

63 Views Asked by At

Let's say that we have a text, for example:

"Check out this deal, everything 20% OFF"

I want to calculate the probability of it being a deal or an offer.

So I have several criteria. For example I know that:

  • If the text contains "% OFF" it has 60% of chances of being a deal (criterion X)
  • If the text contains "deal" it has 25% of chances of being a deal (criterion Y)
  • If the text contains "$" it has 10% of chances of being a deal. (criterion Z)

We can consider X, Y and Z as independent.

So, given a text that meets criteria X and Y, I want to know what are the probabilities of it being a deal (same with Y and Z and so on).

I know that P(A|B) = P(A intersection B) / P(B).

A would be "being a deal". B is "matches criteria X and Y".

So, my question is:

Can anyone help me?

1

There are 1 best solutions below

3
On

You're missing some pieces of information, and I think it's because of the way you're attacking the problem.

You say, "I know that if the text contains "% OFF" it has a 60% chance of being a deal." What makes you say that? How do you know?

To figure this out, I'd take (say) 1,000 lines of text, analyze them, and then classify them as deal or no deal.

For each of those 1,000 lines of text, I can say whether it satisfies $X, Y,$ or $Z$. It could satisfy one, two, all three, or none.

Then, once I've done that, I could say something like this for criterion $X$:

"Of 1,000 lines of text I looked at, 550 had the phrase "% OFF" and 330 of those were actually deals. So, if the phrase "% OFF" was in there, there was an observed 60% chance (330/550) of it being a deal."

I could just as easily do it for the compund criterion $Y \cap Z$:

"Of 1,000 lines of text I looked at, 80 had both the word "deal" and a dollar sign. Sixty of those were actually deals. So, if both the word "deal" and a dollar sign were in there, there was an observed 75% chance (60/80) of it being a deal."

That's the kind of information you need to express compund probabilities. Otherwise, you're just guessing, really.