In MacKay's "Information theory, inference, and learning algorithms" book, in chapter 26 he covers the sum-product algorithm for calculating marginal probabilities, partition functions, etc. The same info can be found in this wikipedia article.
Assume we have an unnormalized probability distribution $P^\ast(x)$ which can can be expressed as a product of factors over different subsets of the components of $x$, for example:
is the graph for:
$P^\ast(x) = f_1(x_1) f_2(x_2) f_3(x_3) f_4(x_1, x_2) f_5(x_2, x_3)$
and for a general factored $P^\ast(x)$:
$Z = \sum\limits_{\pmb x} f_1(x_1) f_2(x_2) \dots f_M(x_M)$
where $\pmb x$ is the full random vector of all components, and each of the $x_i$ are the subset of components that factor $f_i$ depends on. In general, summing over all combinations of all components of $\pmb x$ would be intractably huge, but the sum-product algo avoids lots of unnecessary calculation because $P^\ast$ factors like this.
It seems like because of the factored nature of $P^\ast$, in a tree-like factor graph, we'll always be able to express $Z$ in a way that lets us sum over as few components of $\pmb x$ at once. For example, in the concrete example above,
$Z = \sum\limits_{\pmb x} P^\ast(\pmb x) = \sum\limits_{x_1, x_2, x_3} f_1(x_1) f_2(x_2) f_3(x_3) f_4(x_1, x_2) f_5(x_2, x_3) = \sum\limits_{x_2} f_2(x_2) \bigg[ \sum\limits_{x_1} f_1(x_1) f_4(x_1, x_2) \bigg] \bigg[ \sum\limits_{x_3} f_3(x_3) f_5(x_2, x_3) \bigg]$
This is a simple example, but if each of the $x_i$ components had $N$ discrete values, this would only need $O(N^2)$ operations compared to the naive $O(N^3)$.
My question is the following: is this actually all the sum-product algorithm is doing, in the tree-like graph case? Is it just so we can avoid having to reorder the sum like above, or does it provide something else?
