Maximum entropy, minimum KL divergence, probability functions

84 Views Asked by At

Consider $W = O^n$, the a set of all finite sequences of length $n$ with entries from some finite set $O.$ Let $P$ be a probability function over $W.$ For each $w \in W,$ let $f_w: O \rightarrow \mathbb{R}$ be the relative frequency function of the elements of $O$ in $w$. Let $F$ be the set of $f_w$ such that $P(w)>0.$

Let $H, L,$ and $M$ be probability functions such that:

  1. $H:O \rightarrow \mathbb{R}$ is such that $H(o) = \sum_{w \in W} P(w)f_w(o).$

  2. $L: O \rightarrow \mathbb{R}$ minimizes $\sum_{w \in W}P(w)D_{KL}(L||f_w).$

  3. $M$ is the function in $F$ with the greatest entropy.

For which probability functions $P$ does it follow that any two of $H$, $M$, or $L$ are the same?