Moment projection is defined as $$\text{arg min}_{q\in Q} D(p||q)$$ while information projection is defined as $$\text{arg min}_{q\in Q} D(q||p)$$. Aside from the difference in the formula, how should one interpret the difference in the two measure intuitively? And when should one use moment projection over information projection, and vice versa?
2026-03-25 15:59:09.1774454349
What is the difference between moment projection and information projection?
2.8k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in PROBABILITY
- How to prove $\lim_{n \rightarrow\infty} e^{-n}\sum_{k=0}^{n}\frac{n^k}{k!} = \frac{1}{2}$?
- Is this a commonly known paradox?
- What's $P(A_1\cap A_2\cap A_3\cap A_4) $?
- Prove or disprove the following inequality
- Another application of the Central Limit Theorem
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- A random point $(a,b)$ is uniformly distributed in a unit square $K=[(u,v):0<u<1,0<v<1]$
- proving Kochen-Stone lemma...
- Solution Check. (Probability)
- Interpreting stationary distribution $P_{\infty}(X,V)$ of a random process
Related Questions in MAP-PROJECTIONS
- Bounded linear operator
- Non orthogonal projection of a point onto a plane
- Equal-area projection from sphere to tangent plane
- Howto calculate the latitude of a given y coordinate from a mercator projected map
- How to draw a globe in 2D?
- How to show that a map is linear in $C^n$?
- Covering from Dense Projection
- If projections $P$ and $Q$ are commutative, then $P+Q-PQ$ projects onto $\text{im}P+\text{im}Q$
- Given a closed linear subspace, is there always a projection that maps onto it?
- How can I project a curved surface onto another curved surface?
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
Both the M-projection and the I-projection are projections of a probability distribution $p$ into a set of distributions $Q$. They can be defined as the distribution $q $, chosen among all included in the set $Q $, that is "closest" to $P$. Here the concept of "closest" refers to the distribution that mimimizes the relative entropy from $p$ to $q $, which is a well known measure of distance - also called Kullback–Leibler divergence and commonly denoted as $D(p||q)$. In particular, since the relative entropy expresses the information gained when shifting from $ q$ to $p$, the M-projection and the I-projection can be interpreted as the distributions that mimimize the amount of information lost when $q$ is used as a surrogate of $p $.
Since the relative entropy as a measure of distance is not symmetric, the M-projection and the I-projection are often different. The main differences between them can be well understood if we take into account what they mimimize in terms of entropy and cross entropy. The M-projection is the distribution $q $ that mimimizes
$$D (p||q)=-H_p +E_p (-\log {q}) $$
where $H_p$ is the entropy of the distribution $p $ and $E_p (-\log {q}) $ is the cross entropy between $p$ and $q $. The distribution $q $ that mimimizes this distance usually tends to show high density in all regions that are probable according to $p $ (this is because a small $-\log {q} $ in these regions yields a smaller second term). Also, the distribution $q $ that mimimizes this distance tends to extend over regions with intermediate probability according to $p $ (i.e., it is not strictly concentrated only in the peaks of $p $), because the penalty due to low density in these regions is considerable. The final result is that the M-projection commonly tends to show a relatively large variance.
On the other hand, the I-projection is the distribution $q $ that mimimizes
$$D (q||p)=-H_q +E_q (-\log {p}) $$
where $H_q$ is the entropy of the distribution $q $ and $E_q (-\log {p})$ is the cross entropy between $q $ and $p$. Although the first term gives some penalty for low entropy of $q $, often the effect of the second term predominates, so that the distribution $q $ that mimimizes this distance usually tends to show very high density in all regions where $p $ is large and very low density in all regions where $p $ is small. In other words, the mass of $q $ tends to be concentrated in the peak region of $p$. The final result is that the I-projection commonly tends to show a relatively small variance.
As regards the main applications, both the M-projection and the I-projection play important roles in graphical models. The M-projection is fundamental for learning problems where we have to find a distribution that is closest to the empirical distribution of the data set from which we want to learn. In contrast, the I-projection - easier from a computational point of view - has important applications in information geometry (e.g., thanks to the information-geometric version of Pythagoras' triangle inequality theorem, where the relative entropy is considered as squared distance in a Euclidean space) and to analyze error exponents in various information theory problems such as hypothesis-testing, source coding, and channel coding. Also, it can be used for the management of probability queries, particularly when a distribution $p $ is too complex to allow an efficient answering process. In this case, using a I-projection as an approximation of $p $ may be a good approach to obtain a more efficient elaboration of queries.