I have the following transition matrices, one for Maria and one for Anna:
import pandas as pd
anna_dt = pd.DataFrame({'source': ['tomato', 'carrots', 'cheese'],
'tomato': [0.1, 0.4, 0.1],
'carrots': [0.5,0.5,0.1],
'cheese': [0.4,0.1,0.8]})
maria_dt = pd.DataFrame({'source': ['tomato', 'carrots', 'cheese'],
'tomato': [0.7, 0.4, 0.8],
'carrots': [0.15,0.3,0.1],
'cheese': [0.15,0.3,0.1]})
anna_dt
source tomato carrots cheese
0 tomato 0.1 0.5 0.4
1 carrots 0.4 0.5 0.1
2 cheese 0.1 0.1 0.8
maria_dt
source tomato carrots cheese
0 tomato 0.7 0.15 0.15
1 carrots 0.4 0.30 0.30
2 cheese 0.8 0.10 0.10
What these matrices tell me, is that e.g., there is an 80% chance that Anna is eating cheese after eating cheese and 40% that Anna is eating tomato after eating carrots
Similarly for Maria we have that she has 70% chance that she is tomato after eating tomato and 80% chance that she is tomato after eating cheese.
My question is:
- Given a sequence of foods:
carrots -> carrots -> tomato -> carrots -> tomato -> cheese -> cheese -> cheese, how can I calculate the probability of this sequence belonging toAnnaand the probability of this sequence belonging toMaria? (this sequence is most like belonging toMariasince we can see a lot ofcarrotsfollowed bycarrotsand a lot ofcheesefollowed bycheese)
EDIT
The probability the first food is tomato, carrots, cheese is [1/3, 1/3, 1/3]
You first need to find sequence probability given the person. Then, through Bayes Theorem, you can find person probability given the sequence.
Since you are talking about a Markov chain, it has to be memoryless. Banking on this, let's say that you are looking for $S=ijkjk$ sequence:
$$ P(i-j-k-j-k)=P(i)*P(j|i)*P(k|j)*P(j|k)*P(k|j). $$
Let's call $P(S)=P(j|i)*P(k|j)*P(j|k)*P(k|j)$.
You need to find $P(i)=\pi(i)$ through stationary probabilities, i.e.,
$$ \pi=M\pi. $$
Here might help you with the stationary probabilities.
Note that I answered for a long period. If your period is short, the stationary probabilities are not yet obtained, which complicates things. Then, you need to find probability distribution at the $q^{th}$ step through $M^q$ and the best way is eigendecomposition. Let's say $M$ is your transition matrix. It can be expressed as $QDQ^{-1}$, where $D$ is the diagonal eigenvalue matrix. Then $M^q$ becomes:
$$ M^q=QD^qQ^{-1} $$
Now, the probability that you get this sequence for the first time starting at $n^{th}$ step becomes
$$ P^n=\left(\prod_{q=1}^{n-1}\left(1-\left[M^{q-1} P_0\right]_i *P(S) \right)\right)*[M^{n}P_0]_i*P(S) $$
The product in the parentheses is to make sure that the sequence you are looking for never happens before $n$, and $\left[M^{q-1} P_0\right]_i$ is the $i^{th}$ row of $M^{q-1} P_0$ column matrix, i.e., the probability of $i$ at the $(q-1)^{th}$ step.
If you are looking for the probability that it will happen within the first $m$ steps, just sum $P^n$.
So far, I calculated $P(S|M)$. This $M$ might belong to Anna or Maria. We just need to use the Bayes Theorem for the rest.
$$P(M_A|S)=\frac{P(S|M_A)P(A)}{P(S)},$$
and $P(S)=P(S|M_A)P(A)+P(S|M_M)P(M)$, where $P(A)$ and $P(M)$ are the probability of Anna or Maria eating something.