I would like to compute the point wise mutual information between two words that occur in context of a certain phrase. For example if the words are 'good' and 'great', and the phrase is 'but not', then the whole phrase would be
good but not great
So we have two variables X and Y whose domain ranges over the entire vocabulary, and another random variable R whose domain ranges over all phrases n words long, for some appropriate n. Now I want to compute the PMI of X and Y conditioned on the fact that there is a phrase R (such as 'but not') occurring between them. That is I want:
$PMI(X,Y;R) = P(X,Y|R)\ln\frac{p(X,Y|R)}{p(X|R)p(Y|R)}$
However, I do not know how to express P(X,Y|R). Currently I am trying to express it in terms of a first order Markov model, but that would require me to pick a value for X first, then R conditioned on X, then Y conditioned on R. However, I am already conditioning on the fact that R has been picked, so I am lost.
Any idea on how to express $P(X,Y|R)$?
If we model the process as first order markov then the point-wise mutual information conditioned on $R$ will be 0. As$$P(X,Y|R)=P(X|R)P(Y|X,R)=P(X|R)P(Y|R)$$Using markov property in the last equality.