Does the standard sum-product message passing algorithm save any computation over just doing this?

33 Views Asked by At

In MacKay's "Information theory, inference, and learning algorithms" book, in chapter 26 he covers the sum-product algorithm for calculating marginal probabilities, partition functions, etc. The same info can be found in this wikipedia article.

Assume we have an unnormalized probability distribution $P^\ast(x)$ which can can be expressed as a product of factors over different subsets of the components of $x$, for example:

enter image description here

is the graph for:

$P^\ast(x) = f_1(x_1) f_2(x_2) f_3(x_3) f_4(x_1, x_2) f_5(x_2, x_3)$

and for a general factored $P^\ast(x)$:

$Z = \sum\limits_{\pmb x} f_1(x_1) f_2(x_2) \dots f_M(x_M)$

where $\pmb x$ is the full random vector of all components, and each of the $x_i$ are the subset of components that factor $f_i$ depends on. In general, summing over all combinations of all components of $\pmb x$ would be intractably huge, but the sum-product algo avoids lots of unnecessary calculation because $P^\ast$ factors like this.

It seems like because of the factored nature of $P^\ast$, in a tree-like factor graph, we'll always be able to express $Z$ in a way that lets us sum over as few components of $\pmb x$ at once. For example, in the concrete example above,

$Z = \sum\limits_{\pmb x} P^\ast(\pmb x) = \sum\limits_{x_1, x_2, x_3} f_1(x_1) f_2(x_2) f_3(x_3) f_4(x_1, x_2) f_5(x_2, x_3) = \sum\limits_{x_2} f_2(x_2) \bigg[ \sum\limits_{x_1} f_1(x_1) f_4(x_1, x_2) \bigg] \bigg[ \sum\limits_{x_3} f_3(x_3) f_5(x_2, x_3) \bigg]$

This is a simple example, but if each of the $x_i$ components had $N$ discrete values, this would only need $O(N^2)$ operations compared to the naive $O(N^3)$.

My question is the following: is this actually all the sum-product algorithm is doing, in the tree-like graph case? Is it just so we can avoid having to reorder the sum like above, or does it provide something else?