Belief Propagation Algorithms for Graphical Models with Cycles?

570 Views Asked by At

Belief propagation algorithms cannot solve for the probabilities of a cyclic graphical model; they only work for acyclic graphical models.

For undirected graphical models (for example Markov random fields and conditional random fields in the area of computer vision), when are the graphical models acyclic?

As far as I know, in computer vision, it is common to build grid based graphical models.

In this kind of model, there are too many cycles for belief propagation to be applicable.

So for applications to computer vision, is it necessary to do research to find new belief propagation inference algorithms?

1

There are 1 best solutions below

0
On

The goal of Belief Propagation (BP) algorithm is to compute exact or approximate marginals of some distribution associated to a graphical model. To find more general approaches, I will reformulate it as it is usually done in Physics.

Boltzmann distribution and the Helmholtz free-energy

In physics, the distribution of a graphical model is usually known as Boltzmann distribution. Usually, if you are able to "compute" the Helmholtz free-energy $$ F = -\frac{1}{\beta} \ln Z $$ of a Boltzmann distribution $$ p(x) = \frac{e^{-\beta H(x)}}{Z} $$ where then you are able to compute the marginals. Hence, physicists seek to approximate the Helmholtz free-energy of the problem.

Here, $$ Z = \sum_x e^{-\beta H(x)} $$ is the partition function, $H(x)$ is the Hamiltonian---usually related in the context of computer science to the so called potentials but in physics is related to the energy of a system---and $x=(x_1,...,x_N)$ is a vector of variables $x_i$ which can be associated to the variable nodes $i$ of a corresponding graphical model. The structure of graphical model is determined by the Hamiltonian. If the Hamiltonain can be written as $$ H(x) = \sum_a H_a(x_a) $$ then the $a$ represent indexes of the factor nodes of the graphical model. Here, $x_a = \{x_i:i \in \partial a\}$ is the set of variables corresponding to the set $\partial a$ of variable nodes linked to the factor node $a$.

How to approximate the Helmholtz free-energy? The variational approach

It easy to show that the so called Gibbs free-energy
$$ G[b] = U[b] - T S[b] = \sum_x b(x) E(x) + \frac{1}{\beta} \sum_x b(x) \ln b(x) $$ satisfy $$G[b]=F+D_{KL}(b||p)$$ where $$D_{KL}(b||p)$$ is the Kullback-Leibler divergence; a non-negative quantity that is zero if and only if $b=p$. Here, $b$ represents any trial or belief distribution that you may like to use as an approximation of the exact one, i.e. the Boltzmann distribution $p$.

The method is called the variational approach because the idea is to vary the belief distribution, seeking for one that is easy to compute and, at the same time, reduces Gibbs free-energy as much as possible. In this way, you may be able to find a good approximation for the Helmholtz free-energy.

Mean-field and Bethe's free-energies

The most simple approximation for the Helmholtz free-energy is obtained by using the mean-filed belief distribution $$ b(x) = \prod_i b_i(x) $$ i.e. by assuming that the variables in the graphical model are statistically independent. The mean-field approximation is a first order approximation in a series expansion in terms of cumulants of the so called cluster variation method (see e.g. "Methods in Statistical Phyisics" by Tomoyasu Tanaka).

The standard belief propagation algorithm can be derived from Bethe's approximation to the free-energy (as shown by Yedidia), which is exact in graphical models without loops. Bethe's approximation is the second order approximation in the series expansion of the cluster variation method. It corresponds to a belief of the form $$ b(x) = \prod_a b_a(x_a) \prod_i b_i(x_i)^{1-|\partial i|} $$ where $b_a(x_a)=\sum_{\{x_i:i\notin \partial a\}} b(x)$ and $b_i(x_i)=\sum_{\{x_j:j\neq i\}} b(x)$ are marginals of the belief distribution $b(x)$. The indexes $i$ run over variable nodes, $\partial i$ is the set of factor nodes linked to variable node $i$ and $|\partial i|$ is its cardinality.

Bethe's approximation works well in pairwise graphical models, i.e. in graphical models where $|\partial i|\leq 2$. In graphical models with factor-nodes of degree larger than 2, strange things may happen; e.g. a negative entropy. This is because Bethe's approximation corresponds to the second-order approximation of the cumulant and correlations between 3 or more variables cannot be appropriately represented.

You can continue further and compute the third order, fourth order, and so on, of the cumulant expansion. The more cumulants you compute, the closer the Gibbs free-energy gets to the Helmholtz free-energy. In particular, if you have $N$ variables, the series expansion becomes exact after you compute up to the $N$-th order cumulant.

Usually, these cumulants can be computed by using iterative methods such as BP, or the Generalized BP (GBP) if you are beyond Bethe's approximation. The problem is that, the number of equations you need to iterate grows exponentially with the number of cumulants you want to compute. Hence, for $N$ large, the problem becomes computationally intractable.

Luckily, in many cases correlations between variables are weak and, therefore, Bethe's approximation---or, equivalently, the second order approximation and the BP algorithm---works well practice even if the graphical model has loops.