Bayesian learning for input "If A, then B."

101 Views Asked by At

Can anyone point me to literature on Bayesian learning when the new information has the form “If A, then B”? I’m familiar with the rule that after one learns X, posterior probability P(Y) equals prior conditional probability P(Y|X). But what about cases where X is itself a conditional statement?

For context:

  1. I know about updating on less-than-certain information (Jeffrey conditioning). That doesn't seem like the key here, however.
  2. I realize one can treat learning “If A then B” as learning “Either B or else not-A.” But is that the only option? Has anyone treated learning "If A, then B" as a direct prompt to revise one’s conditional probability P(B|A)?
  3. The textbook answer may be “That can’t be done, because there’s no formalism for a conditional probability P(B|A) itself being conditioned on a third proposition C (where this could be “If A then B”)." If that’s the answer, fine. But I’d also like to know if the range of options includes, “Oh, yes, that double-conditional thing sounds like a problem, but really it’s not, (etc.).”
  4. Please, if possible, give one or more specific references (texts/journal articles), or in lieu of that, a technical phrase I can use in a Google search.
2

There are 2 best solutions below

0
On

I can give you the answer, which I think is quite simple once you know it, but I'm afraid I don't really have any references for it. I learnt from E.T. Jaynes' "Probability Theory: The Logic of Science", but that spends a lot of its time on pro-Bayesian propoganda, and so might not be the best textbook.

Anyway the answer is sort of "just do it", but I'll explain the calculation since juggling expressions like "$X\wedge(B\Rightarrow A)$" is very confusing until you get used to it.

(Notation: I'll use "$¬$" for "not", "$\wedge$" for "and", and "$\vee$" for "or".)

The definition of conditional probability is $$P(Y|X)=\frac{P(Y\wedge X)}{P(X)}$$ So when we want to calculate $P(Y|(A\Rightarrow B))$ we just use the above formula with $(A\Rightarrow B)$ replacing $X$ to get $$P(Y|(A\Rightarrow B))=\frac{P(Y\wedge (A\Rightarrow B))}{P(A\Rightarrow B)}$$ The thing on the left hand side is what you want. The probability expressions on the right hand side are just unconditional probabilities, so you should be able to calculate them straight away (e.g. by the usual method of seeing what proportion of possible outcomes satisfy the statement).

(The above should answer your question, but since you mentioned "double conditioning", I thought I'd also explain how that works:

If we've already conditioned on $W$ and we want to futher condition on $X$ then we just repeat the definition but with "$|W$" to the right of everything: $$P(Y|X,W)=\frac{P(Y\wedge X|W)}{P(X|W)}$$ Note that we write $P(Y|X,W)$ for the probability of $Y$ conditioned on $W$ and then $X$, rather than $P(Y|X|W)$ like you might expect. You'll see why this is in a second.

We think of conditioning on $W$ as "restricting to the set of possibilities where $W$ is true". So conditioning on $W$ and then $X$ is restricting to the set of possibilities where $W$ is true and furthermore $X$ is true. I.e. we are restricting to the set of possibilities where $X$ and $W$ are both true. So in fact the three following expressions are all equal. $$P(Y|X,W)=P(Y|W,X)=P(Y|X\wedge W)$$ You can check this by expanding them out using the definitions of conditional probability above. They all give $$\frac{P(Y\wedge X\wedge W)}{P(X\wedge W)}.$$ So in fact people don't ever do "double conditioning" they just think of "conditioning on $X$ and on $W$" as a synonym of "conditioning on $X\wedge W$". This explains why we use the notation $P(Y|X,W)$; it's just the thing we want the probability of, followed by the "$|$" symbol, followed by the list of the things we are conditioning on.)

5
On

@Oscar: I hope I'm doing this right. My reply is apparently too long for a comment.

Thank you for that detailed answer. I've seen Jaynes’s name a lot lately, and I appreciate your tip that here, too, he might have something interesting to say.

I also appreciate the notes on double conditioning, especially the part about the comma notation. However, I think I caused some confusion. What I ended up saying in my earlier context note 3 is that maybe there’s no formalism for $P((B|A)|C)$. If I understand you right, you’re pointing out that this can be treated as $P(B|(A\wedge C))$, which I get.

However, what I meant to say was that maybe there’s no formalism for evaluating $P(C|(B|A))$. Which brings us to your main reply. When you say that $P(Y\wedge (A\Rightarrow B))$ and $P(A\Rightarrow B)$ are just unconditional probabilities, I think you must have in mind that $\Rightarrow $ is the material conditional; that is, $A\Rightarrow B$ means the same thing as $¬A\vee B$. If you don’t mean that, then I’m not understanding you. But if you do mean that, then this is what I was trying to anticipate in my context note 2. I know one can construe conditioning on "If $A$, then $B$" as conditioning on the material conditional, but I was wondering whether it was the only way to go. Your answer/Jaynes's answer would seem to be that yes, this is the only way to go!