HMM as special case of MRF

868 Views Asked by At

I have learned that any Hidden Markov Model (HMM) can be described as a special case of a Markov Random Field (MRF) model.

However, AFAIK, the dependencies in a HMM are directed, while the dependencies in a MRF are undirected.

So, how can I represent a HMM as an MRF? How can I represent the potential functions of the MRF, as a function of the transition probabilities of the HMM?

Suppose we have an HMM, where $x_t$ are the observations and $z_t$ are the latent states:

$p(x_1,\ldots,x_T,z_1,\ldots,z_T) = p(z_1)\prod_{t=1}^Tp(z_{t+1}|z_t)p(x_t|z_t)$

My conjecture was that the equivalent MRF is a linear graph, with variables $z_1 \ldots z_T$, and the following potential functions:

$\psi_{\{z_t\}} = $

  • $p(z_1) * p(x_1 | z_1)$ $(t == 1)$
  • $p(x_t | z_t)$ $(t>1)$

And:

$\psi_{\{z_t,z_{t+1}\}} = p(z_{t+1} | z_t) $

Is this correct?

1

There are 1 best solutions below

2
On BEST ANSWER

In general, any directed graphical model can be converted into an undirected graphical model (though the converse is not true). They're just two different ways to express a joint probability distribution. An excellent review of graphical models is given here, and some discussion of converting directed to undirected models is given in section 2.5. The critical step involves connecting the parents of every child node together, which is cheekily referred to as "moralization". However, in an HMM, each node in the directed model has only one parent, so no moralization is necessary (in other words, statisticians consider single parents perfectly moral).

The actual probabilities in an HMM are given by the following, where $x_t$ are the observations and $z_t$ are the latent states:

$p(x_1,\ldots,x_T,z_1,\ldots,z_T) = p(z_1)\prod_{t=1}^Tp(z_{t+1}|z_t)p(x_t|z_t)$

So if we write down an MRF with a single-node clique for $z_1$ with $\psi_{\{z_1\}} = p(z_1)$, cliques for $\{z_{t},z_{t+1}\}$ with $\psi_{\{z_{t},z_{t+1}\}} = p(z_{t+1}|z_t)$ and cliques for $\{z_{t},x_{t}\}$ with $\psi_{\{z_{t},x_{t}\}} = p(x_{t}|z_t)$, this probability distribution is clearly the same as the one above for an HMM.

One advantage of directed graphical models over their undirected cousins is that they explicitly model how different variables change when manipulated. If we fix the value of one node in our directed graphical model, the graph structure tells us which downstream variables will be affected. This information is thrown out when converting to an undirected model. If, however, we are only interested in passive observation rather than manipulation, the two describe exactly the same distribution over data.

Another advantage is that directed graphical models are generally much easier to sample from, because there is a clear order to how data are generated. In undirected graphical models, one usually has to run a Markov chain to convergence, for instance by Gibbs sampling, which can be very slow.