Conditional probability with a temporal feature

46 Views Asked by At

My probability skills are quite rusty and I'm becoming stuck with a text mining problem that requires knowing the conditional probability of a word's co-occurrence.

To be more specific, say I have two speakers having a back-and-forth conversation, I want to determine the conditional probability of Alice using a word w given that it had been spoken by Bob in the previous utterance, not a previous utterance. Consider the below, with the word 'one':

Bob: One fish, Two fish, Red fish, Blue fish,

Alice: Black fish, Blue fish, Old fish, New fish.

Bob: This one has a little car.

Alice: This one has a little star.

Using a binary variable (where 1 = the presence of the word in the sentence, and 0 means the absence of the word in the sentence), I have gotten as far as deducing that the probability of Alice using the word 'one' is 1/2 (because she has two utterances, one of which contains the word I am looking for), and the probability of Bob using the word is 2/2 (because he has two utterances, both of which contain the word I am looking for) - Although crucially, only one corresponds with Alice using the word in an immediate reply.

Intuitively, I am assuming that the way to resolve this is to count the number of times Bob says the word I am looking for as the denominator, and to count the number of successes (Alice immediately repeating the word) as the numerator. This makes sense and, i think, corresponds with the definition for a conditional probability (as I am narrowing the sample space) - but as I have arrived at it atheoretically I am a little unsure if it is correct.