let $P(x_1,..,x_N)$ be a probability distribution that factorises as follows : $$P(\mathbf{x}) = \frac{1}{Z} \psi_{1,2}(x_1,x_2)\cdot \psi_{2,3}(x_2,x_3) \dots\psi_{N-2,N-1}(x_{N-2},x_{N-1})\cdot \psi_{N-1,N}(x_{N-1},x_{N}) $$ Now, we perform a marginal inference for $x_n$ as follows: $$P(x_n) = \sum_{x_1}\dots \sum_{x_{n-1}}\sum_{x_{n+1}} \dots\sum_{x_n}P(\mathbf{x})$$ If each $x_i$ can take $K$ values then this requires us $O(K^N)$ steps to perform this inference, but using distributivity of multiplication over addition and the given factorisation of $P(\mathbf{x})$, we can perform it as follows: $$P(x_n) = [\sum_{x_{n-1}} \psi_{n-1,n}(x_{n-1},x_n) \dots [\sum_{x_2} \psi_{2,3}(x_2,x_3)[\sum_{x_1} \psi_{1,2}(x_1,x_2)]]...]\times[\sum_{x_{n+1}} \psi_{n,n+1}(x_{n-1},x_n) \dots [\sum_{x_{N-1}} \psi_{N-2,N-1}(x_{N-1},x_{N-1})[\sum_{x_N} \psi_{N-1,N}(x_{N-1},x_{N})]]...] $$ Now, I am not very clear how the last reduction is achieved is there a easy way to see it ?
Original source : Pattern Recognition and Machine Learning, Christopher Bishop. Chapter 8 section 8.4.1. As per the author this is a simple use of distributivity of multiplication over addition. But its not so evident to me.