What is the difference between moment projection and information projection?

Question

What is the difference between moment projection and information projection?

2.8k Views Asked by Bumbble Comm At 10 May 2026 - 7:09

Moment projection is defined as $$\text{arg min}_{q\in Q} D(p||q)$$ while information projection is defined as $$\text{arg min}_{q\in Q} D(q||p)$$. Aside from the difference in the formula, how should one interpret the difference in the two measure intuitively? And when should one use moment projection over information projection, and vice versa?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2016-11-09 00:48:08

Both the M-projection and the I-projection are projections of a probability distribution $p$ into a set of distributions $Q$. They can be defined as the distribution $q $, chosen among all included in the set $Q $, that is "closest" to $P$. Here the concept of "closest" refers to the distribution that mimimizes the relative entropy from $p$ to $q $, which is a well known measure of distance - also called Kullback–Leibler divergence and commonly denoted as $D(p||q)$. In particular, since the relative entropy expresses the information gained when shifting from $ q$ to $p$, the M-projection and the I-projection can be interpreted as the distributions that mimimize the amount of information lost when $q$ is used as a surrogate of $p $.

Since the relative entropy as a measure of distance is not symmetric, the M-projection and the I-projection are often different. The main differences between them can be well understood if we take into account what they mimimize in terms of entropy and cross entropy. The M-projection is the distribution $q $ that mimimizes

$$D (p||q)=-H_p +E_p (-\log {q}) $$

where $H_p$ is the entropy of the distribution $p $ and $E_p (-\log {q}) $ is the cross entropy between $p$ and $q $. The distribution $q $ that mimimizes this distance usually tends to show high density in all regions that are probable according to $p $ (this is because a small $-\log {q} $ in these regions yields a smaller second term). Also, the distribution $q $ that mimimizes this distance tends to extend over regions with intermediate probability according to $p $ (i.e., it is not strictly concentrated only in the peaks of $p $), because the penalty due to low density in these regions is considerable. The final result is that the M-projection commonly tends to show a relatively large variance.

On the other hand, the I-projection is the distribution $q $ that mimimizes

$$D (q||p)=-H_q +E_q (-\log {p}) $$

where $H_q$ is the entropy of the distribution $q $ and $E_q (-\log {p})$ is the cross entropy between $q $ and $p$. Although the first term gives some penalty for low entropy of $q $, often the effect of the second term predominates, so that the distribution $q $ that mimimizes this distance usually tends to show very high density in all regions where $p $ is large and very low density in all regions where $p $ is small. In other words, the mass of $q $ tends to be concentrated in the peak region of $p$. The final result is that the I-projection commonly tends to show a relatively small variance.

As regards the main applications, both the M-projection and the I-projection play important roles in graphical models. The M-projection is fundamental for learning problems where we have to find a distribution that is closest to the empirical distribution of the data set from which we want to learn. In contrast, the I-projection - easier from a computational point of view - has important applications in information geometry (e.g., thanks to the information-geometric version of Pythagoras' triangle inequality theorem, where the relative entropy is considered as squared distance in a Euclidean space) and to analyze error exponents in various information theory problems such as hypothesis-testing, source coding, and channel coding. Also, it can be used for the management of probability queries, particularly when a distribution $p $ is too complex to allow an efficient answering process. In this case, using a I-projection as an approximation of $p $ may be a good approach to obtain a more efficient elaboration of queries.

What is the difference between moment projection and information projection?

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in MAP-PROJECTIONS

Trending Questions

Popular # Hahtags

Popular Questions