Consider a random variable $X$ taking values in $\{0,1,\ldots,n\}$ and $Y$ takes values in $\{0,1\}$.
Let $a_{i}=P\left(X={i}\right), b_{j}=P(Y=j)$.
Also, $ p_{i}=P\left(Y=0 \mid X=x_{i}\right), q_{i}=1-p_{i}$. Then $b_{0}=(\mathbf{a} \cdot \mathbf{p})$ and $b_{1}=(\mathbf{a} \cdot \mathbf{q})$.
Assume that $p_i < p_{i+1}.$
Consider maximizing the mutual information $I(X ; Y)$ between $X$ and $Y$ given below: $$ \begin{aligned} I(X ; Y) &=\sum_{x, y} P_{x} P_{y \mid x} \log \frac{P_{y \mid x}}{P_{y}} \\ &=\sum_{i} a_{i}\left(p_{i} \log \frac{p_{i}}{b_{0}}+q_{i} \log \frac{q_{i}}{b_{1}}\right) \ \end{aligned} $$
If there are no constraints on $\mathbf{a}$, the mutual information $I(X ; Y)$ is maximized when $\mathbf{a}=\left(x_{0}, 0,0, \cdots, 0, x_{n}\right)$ as shown here. Thus, in the optimal case all masses are zero except at the end points.
I am interested to know how the mutual information $I(X ; Y)$ changes when the distribution $\mathbf{a}$ has more non-zero mass points. Consider the following sequence of distributions:
- $\mathbf{a}_1= $n$ \times Bern(r_1)$.
- $\mathbf{a}_2= \frac{n}{2} \times Binomial(2,r_2)$.
- $\mathbf{a}_3= \frac{n}{3} \times Binomial(3,r_3)$.
$\vdots$
So in general, assuming $i$ divides $n$ perfectly, $\mathbf{a}_i= \frac{n}{i} \times Binomial(i,r_i)$. In each case, we can optimize $r_i$ to maximize mutual information. Note that across all cases, $a_1$ leads to the maximum mutual information with $r_1$ optimally chosen.
Is it true to claim that the the maximum of mutual information $I(X ; Y)$ under input distribution $\mathbf{a}_i$ is greater than the maximum of $I(X ; Y)$ under $\mathbf{a}_{i+1}$.