I am deeply struggling with understanding how to apply the Viterbi algorithm. From my course notes, I have the following simple(I'm told) example:
If the sequence
HH
was observed, what is the most likely sequence in which Fair and Biased coins were used ?
Following table was generated as part of solution:
and the answer was given as O-Biased-Fair with probability 0.2025.
Can someone explain this answer in much detail and simplest way possible ? Thanks.
The table has values $$ \begin{array}{lc|lcrr} & & & \text{H} & \text{H} & \qquad\text{($X_i$ values)} \\ \hline & 0 & \nu_1(0) & \nu_2(0) & \nu_3(0) & \\ \text{($\pi_i$ values)} & F & \nu_1(F) & \nu_2(F) & \nu_3(F) & \\ & B & \nu_1(B) & \nu_2(B) & \nu_3(B) & \end{array} $$
where $\nu_i(j),\; i\in\{1,2,3\},\; j\in\{0,F,B\}$ are defined in your notes.
We are given:
\begin{align} P(\pi_1=0) &= 1 \\ P(\pi_2=F\mid \pi_1=0) &= 0.5 \\ P(\pi_2=B\mid \pi_1=0) &= 0.5 \\ \text{for $n\geq 3$}\qquad P(\pi_n=F\mid \pi_{n-1}=F) &= 0.9 \\ P(\pi_n=B\mid \pi_{n-1}=F) &= 0.1 \\ P(\pi_n=F\mid \pi_{n-1}=B) &= 0.9 \\ P(\pi_n=B\mid \pi_{n-1}=B) &= 0.1 \\ & \\ P(X_1=\text{null}\mid \pi_1=0) &= 1 \\ \text{for $n\geq 2$}\qquad P(X_n=H\mid \pi_n=F) &= 0.5 \\ P(X_n=T\mid \pi_n=F) &= 0.5 \\ P(X_n=H\mid \pi_n=B) &= 0.9 \\ P(X_n=T\mid \pi_n=B) &= 0.1. \end{align}
We now calculate the $\nu_i(j)$ values recursively.
\begin{align} \nu_1(0) &= P(X_1=\text{null}\mid \pi_1=0)P(\pi_1=0) \\ &= 1\times 1 = 1 \\ \nu_1(F) &= P(X_1=\text{null}\mid \pi_1=F)P(\pi_1=F) = 0 \qquad\text{since $P(\pi_1=F) = 0$} \\ \nu_1(B) &= P(X_1=\text{null}\mid \pi_1=B)P(\pi_1=B) = 0 \qquad\text{since $P(\pi_1=B) = 0$}. \end{align}
These three values form column $1$ of the result table.
\begin{align} & \\ \nu_2(0) &= P(X_2=H\mid \pi_2=0)\max_{l\in\{0,F,B\}} \{\nu_1(l)P(\pi_2=0\mid\pi_1=l)\} \\ &= 0 \\ & \\ \nu_2(F) &= P(X_2=H\mid \pi_2=F)\max_{l\in\{0,F,B\}} \{\nu_1(l)P(\pi_2=F\mid\pi_1=l)\} \\ &= 0.5\times (1\times 0.5) = 0.25 \qquad\text{max occurs when $l=0$, hence arrow from $0.25$ to $1$}\\ & \\ \nu_2(B) &= P(X_2=H\mid \pi_2=B)\max_{l\in\{0,F,B\}} \{\nu_1(l)P(\pi_2=B\mid\pi_1=l)\} \\ &= 0.9\times (1\times 0.5) = 0.45 \qquad\text{max occurs when $l=0$, hence arrow from $0.45$ to $1$}\\ \end{align}
These three values form column $2$ of the result table.
\begin{align} & \\ \nu_3(0) &= P(X_3=H\mid \pi_3=0)\max_{l\in\{0,F,B\}} \{\nu_2(l)P(\pi_3=0\mid\pi_2=l)\} \\ &= 0 \\ & \\ \nu_3(F) &= P(X_3=H\mid \pi_3=F)\max_{l\in\{0,F,B\}} \{\nu_2(l)P(\pi_3=F\mid\pi_2=l)\} \\ &= 0.5\times (0.45\times 0.9) = 0.2025 \qquad\text{max occurs when $l=B$, hence arrow from $0.2025$ to $0.45$}\\ & \\ \nu_3(B) &= P(X_3=H\mid \pi_3=B)\max_{l\in\{0,F,B\}} \{\nu_2(l)P(\pi_3=B\mid\pi_2=l)\} \\ &= 0.9\times (0.45\times 0.1) = 0.0405 \qquad\text{max occurs when $l=B$, hence arrow from $0.0405$ to $0.45$}\\ \end{align}
These three values form column $3$ of the result table.
The maximum of the column $3$ values is $\nu_3(F)=0.2025$. Following the arrows for that value we get the most probable sequence, namely $0-B-F$.