I've been looking at Michael Jordan's graphical model introduction at: https://www.cs.cmu.edu/~aarti/Class/10701/readings/graphical_model_Jordan.pdf page 8-9, where he does a derivation that is not fully clear to me. The screenshot is below.
I understand the transition to the second line assuming $m_6(x_2, x_5)$ is just the marginal sum of $\psi(x_2, x_5, x_6)$ over $x_6$, but the transition from line 2 to line 3 is confusing. I'm assuming $m_i$ is just the marginal probability over $x_i$, but he defines $m_5$ completely ignoring the previous $m_6$ term. What I got for line 3 was: $p(x_1) = \sum_{x_2} p(x_1, x_2) m_5(m_6(x_2)) \sum_{x_3} p(x_1, x_3) m_5(x_3) \sum_{x_4} p(x_2, x_4)$
It seems like the rest of the derivation does the same thing where it removes preceding $m_i$ terms, which I don't understand. Could someone explain this to me? Thanks!
