Best time to make a guess in an HMM with discounting?

43 Views Asked by At

Consider a hidden Markov model (HMM) with $2$ states and $2$ outputs. The transition probabilities are $p_{ij}$, $i,j=1,2$ and output (emission) probabilities are $q_{ik}$, $i=1,2$, $k=1,2$. Assume an initial (prior) belief on the underlying state of $p$ (that is, $p = P(x_0 = 1)$).

Now, let's run the underlying Markov chain to generate a sequence of state realizations $x_0,x_1,\ldots$ (where $x_0$ is chosen from $\{1,2\}$ according to $p$). We assume that:

  1. We don't see the true states, instead we see the output sequence $y_0,y_1,\ldots$ generated according to the probabilities $q_{ik}$.

  2. We see a signal anytime the underlying chain changes states. That is, if $x_{t-1}\neq x_{t}$ then we see a signal $w_t=1$, otherwise $w_t=0$.

Imagine we get a discounted reward if we guess the true state (and zero if we make an incorrect guess). That is, if we make a correct guess at time $t$, then we get $\delta^t$ where $\delta\in(0,1)$ is given. Thus, our objective is to specify both a time $t$ and a guess $\hat x\in\{1,2\}$ such that $\delta^tP(x_t=\hat x)$ is maximized.

Question: What is the optimal rule determining this choice (as a function of $p$, $\delta$, $p_{ij}$'s, and $q_{ik}$'s)? That is, what's the mapping from the history of outputs ($y_0,y_1,\ldots$) and signals ($w_0,w_1,\ldots$) to a time-guess pair $(t,\hat x)$ that maximizes the discounted probability of being correct? Has anyone seen a similar problem in the literature?