Probability that a certain sequence is belonging to a certain transition matrix

69 Views Asked by At

I have the following transition matrices, one for Maria and one for Anna:

import pandas as pd
anna_dt = pd.DataFrame({'source': ['tomato', 'carrots', 'cheese'],
                              'tomato': [0.1, 0.4, 0.1],
                              'carrots': [0.5,0.5,0.1],
                               'cheese': [0.4,0.1,0.8]})
maria_dt = pd.DataFrame({'source': ['tomato', 'carrots', 'cheese'],
                              'tomato': [0.7, 0.4, 0.8],
                              'carrots': [0.15,0.3,0.1],
                               'cheese': [0.15,0.3,0.1]})

anna_dt
source  tomato  carrots cheese
0   tomato  0.1 0.5 0.4
1   carrots 0.4 0.5 0.1
2   cheese  0.1 0.1 0.8

maria_dt
source  tomato  carrots cheese
0   tomato  0.7 0.15    0.15
1   carrots 0.4 0.30    0.30
2   cheese  0.8 0.10    0.10

What these matrices tell me, is that e.g., there is an 80% chance that Anna is eating cheese after eating cheese and 40% that Anna is eating tomato after eating carrots

Similarly for Maria we have that she has 70% chance that she is tomato after eating tomato and 80% chance that she is tomato after eating cheese.

My question is:

  • Given a sequence of foods: carrots -> carrots -> tomato -> carrots -> tomato -> cheese -> cheese -> cheese, how can I calculate the probability of this sequence belonging to Anna and the probability of this sequence belonging to Maria ? (this sequence is most like belonging to Maria since we can see a lot of carrots followed by carrots and a lot of cheese followed by cheese)

EDIT The probability the first food is tomato, carrots, cheese is [1/3, 1/3, 1/3]

1

There are 1 best solutions below

0
On

You first need to find sequence probability given the person. Then, through Bayes Theorem, you can find person probability given the sequence.

Since you are talking about a Markov chain, it has to be memoryless. Banking on this, let's say that you are looking for $S=ijkjk$ sequence:

$$ P(i-j-k-j-k)=P(i)*P(j|i)*P(k|j)*P(j|k)*P(k|j). $$

Let's call $P(S)=P(j|i)*P(k|j)*P(j|k)*P(k|j)$.

You need to find $P(i)=\pi(i)$ through stationary probabilities, i.e.,

$$ \pi=M\pi. $$

Here might help you with the stationary probabilities.

Note that I answered for a long period. If your period is short, the stationary probabilities are not yet obtained, which complicates things. Then, you need to find probability distribution at the $q^{th}$ step through $M^q$ and the best way is eigendecomposition. Let's say $M$ is your transition matrix. It can be expressed as $QDQ^{-1}$, where $D$ is the diagonal eigenvalue matrix. Then $M^q$ becomes:

$$ M^q=QD^qQ^{-1} $$

Now, the probability that you get this sequence for the first time starting at $n^{th}$ step becomes

$$ P^n=\left(\prod_{q=1}^{n-1}\left(1-\left[M^{q-1} P_0\right]_i *P(S) \right)\right)*[M^{n}P_0]_i*P(S) $$

The product in the parentheses is to make sure that the sequence you are looking for never happens before $n$, and $\left[M^{q-1} P_0\right]_i$ is the $i^{th}$ row of $M^{q-1} P_0$ column matrix, i.e., the probability of $i$ at the $(q-1)^{th}$ step.

If you are looking for the probability that it will happen within the first $m$ steps, just sum $P^n$.

So far, I calculated $P(S|M)$. This $M$ might belong to Anna or Maria. We just need to use the Bayes Theorem for the rest.

$$P(M_A|S)=\frac{P(S|M_A)P(A)}{P(S)},$$

and $P(S)=P(S|M_A)P(A)+P(S|M_M)P(M)$, where $P(A)$ and $P(M)$ are the probability of Anna or Maria eating something.