Significance of Convex Sets for I-Projection

208 Views Asked by At

I have been reviewing the literature on information theoretic methods in statistics, and in particular, the method of I-projections. Given a discrete, finite alphabet $\mathcal{X}$, let $\prod$ denote a set of probability mass functions (pmfs) on $\mathcal{X}$. Suppose $Q$ is a pmf on $\mathcal{X}$ which does not belong to $\prod$.

The I-projection of $Q$ on $\prod$ is defined as the element $P^{*}\in \prod$ such that

\begin{eqnarray}P^{*}=\arg \min \limits_{P\in \prod} D(P||Q),\end{eqnarray} where $D(P||Q)$ denotes the Kullback-Leibler divergence (KL-divergence) between pmfs $P$ and $Q$, defined as \begin{eqnarray}D(P||Q)=\sum\limits_{x\in \mathcal{X}} P(x)\log\left(\frac{P(x)}{Q(x)}\right).\end{eqnarray}

A sufficient condition for the existence of a $P^{*}$ as above requires $\prod$ to be closed. In addition, in the literature, $\prod$ is considered to be a convex set. For example, the family

\begin{eqnarray} \mathcal{P}=\lbrace P: \sum\limits_{x\in \mathcal{X}} P(x)f_{i}(x)=\alpha_{i}, ~1\leq i\leq k\rbrace, \end{eqnarray} where $f_{i}:\mathcal{X}\rightarrow \mathbb{R}$ are measurable functions and $\alpha_{i}\in \mathbb{R}$, called a linear family of pmfs, is typically considered in place of $\prod$ in the literature.

Question: Can someone please explain the significance of using convex sets? Do things fail if we do not consider convex sets?

1

There are 1 best solutions below

1
On BEST ANSWER

The significance of having convex sets comes from statistical mechanics. This is mentioned at least in two places in Cover & Thomas's book (2nd edition); the first paragraph in Ch. 12 and Example 12.2.5.

However, convexity of $\mathcal{P}$ does not play any role for the existence of I-projection. Since $D(\cdot\|\cdot)$ is lower semi-continuous, the existence of I-projection is always guaranteed so long as $\mathcal{P}$ is compact.