Can anyone point me to literature on Bayesian learning when the new information has the form “If A, then B”? I’m familiar with the rule that after one learns X, posterior probability P(Y) equals prior conditional probability P(Y|X). But what about cases where X is itself a conditional statement?
For context:
- I know about updating on less-than-certain information (Jeffrey conditioning). That doesn't seem like the key here, however.
- I realize one can treat learning “If A then B” as learning “Either B or else not-A.” But is that the only option? Has anyone treated learning "If A, then B" as a direct prompt to revise one’s conditional probability P(B|A)?
- The textbook answer may be “That can’t be done, because there’s no formalism for a conditional probability P(B|A) itself being conditioned on a third proposition C (where this could be “If A then B”)." If that’s the answer, fine. But I’d also like to know if the range of options includes, “Oh, yes, that double-conditional thing sounds like a problem, but really it’s not, (etc.).”
- Please, if possible, give one or more specific references (texts/journal articles), or in lieu of that, a technical phrase I can use in a Google search.
I can give you the answer, which I think is quite simple once you know it, but I'm afraid I don't really have any references for it. I learnt from E.T. Jaynes' "Probability Theory: The Logic of Science", but that spends a lot of its time on pro-Bayesian propoganda, and so might not be the best textbook.
Anyway the answer is sort of "just do it", but I'll explain the calculation since juggling expressions like "$X\wedge(B\Rightarrow A)$" is very confusing until you get used to it.
(Notation: I'll use "$¬$" for "not", "$\wedge$" for "and", and "$\vee$" for "or".)
The definition of conditional probability is $$P(Y|X)=\frac{P(Y\wedge X)}{P(X)}$$ So when we want to calculate $P(Y|(A\Rightarrow B))$ we just use the above formula with $(A\Rightarrow B)$ replacing $X$ to get $$P(Y|(A\Rightarrow B))=\frac{P(Y\wedge (A\Rightarrow B))}{P(A\Rightarrow B)}$$ The thing on the left hand side is what you want. The probability expressions on the right hand side are just unconditional probabilities, so you should be able to calculate them straight away (e.g. by the usual method of seeing what proportion of possible outcomes satisfy the statement).
(The above should answer your question, but since you mentioned "double conditioning", I thought I'd also explain how that works:
If we've already conditioned on $W$ and we want to futher condition on $X$ then we just repeat the definition but with "$|W$" to the right of everything: $$P(Y|X,W)=\frac{P(Y\wedge X|W)}{P(X|W)}$$ Note that we write $P(Y|X,W)$ for the probability of $Y$ conditioned on $W$ and then $X$, rather than $P(Y|X|W)$ like you might expect. You'll see why this is in a second.
We think of conditioning on $W$ as "restricting to the set of possibilities where $W$ is true". So conditioning on $W$ and then $X$ is restricting to the set of possibilities where $W$ is true and furthermore $X$ is true. I.e. we are restricting to the set of possibilities where $X$ and $W$ are both true. So in fact the three following expressions are all equal. $$P(Y|X,W)=P(Y|W,X)=P(Y|X\wedge W)$$ You can check this by expanding them out using the definitions of conditional probability above. They all give $$\frac{P(Y\wedge X\wedge W)}{P(X\wedge W)}.$$ So in fact people don't ever do "double conditioning" they just think of "conditioning on $X$ and on $W$" as a synonym of "conditioning on $X\wedge W$". This explains why we use the notation $P(Y|X,W)$; it's just the thing we want the probability of, followed by the "$|$" symbol, followed by the list of the things we are conditioning on.)