There is one step in this proof that I don't understand.
$$-\sum\limits_{i\in I} q_i+\sum\limits_{i\in I} p_i \geq 0$$
I get that $\sum\limits_{i\in I} p_i=1$, but what about the another term? Why is $\sum\limits_{i\in I} q_i \leq 1$?
There is one step in this proof that I don't understand.
$$-\sum\limits_{i\in I} q_i+\sum\limits_{i\in I} p_i \geq 0$$
I get that $\sum\limits_{i\in I} p_i=1$, but what about the another term? Why is $\sum\limits_{i\in I} q_i \leq 1$?
Copyright © 2021 JogjaFile Inc.
$p,q$ are both probability distributions over a set of $n$ elements $[n]=\{1,\dots,n\}$, so $$ \sum_{i=1}^n q_i = 1, \qquad 0\leq q_i \leq 1\ \forall i\in [n] $$ and, for all $S\subseteq [n]$, this implies $$ 0 \leq \sum_{i\in S} q_i \leq \sum_{i=1}^n q_i = 1. \tag{1} $$ (Same thing for $p$). The inequality can be strict if you choose $S$ such that $[n]\setminus S$ contains elements for which $q_i > 0$ (so that the sum $\sum_{i\in S} q_i$ is "missing" non-zero elements: indeed, for any $S\subseteq [n]$, $$ 1= \sum_{i=1}^n q_i = \underbrace{\sum_{i\in S} q_i}_{\geq 0}+\underbrace{\sum_{i\in [n]\setminus S} q_i}_{\geq 0} \tag{2} $$ so if $\sum_{i\in [n]\setminus S} q_i > 0$ then $\sum_{i\in S} q_i < 1$. (Same thing for $p$, again.)
In the proof, the "set $S$" is chosen to be the support of $p$, namely the set $$ I\stackrel{\rm def}{=} \{ i \in [n] : p_i > 0 \}. \tag{3} $$ This immediately guarantees that $\sum_{i\in I} p_i = 1$ by the above decomposition (2), since by definition of $I$ we have $\sum_{i\in [n]\setminus I} p_i = 0$. However, we are not guaranteed that $\sum_{i\in [n]\setminus I} q_i = 0$ (there is no a priori relation between $I$ and $q$), so we can only apply (1) and say that $0 \leq \sum_{i\in I} q_i \leq 1$.
An example, for $n=3$ (one could easily get one for $n=2$, but it is less 'striking' in my opinion): take $$p = (1/2,1/2,0), \qquad q=(1/2,0,1/2).$$ Then $I=\{1,2\}$, since $p_1,p_2 > 0$ but $p_3 = 0$, and we have $\sum_{i\in I} p_i = 1$. However, $$\sum_{i\in I} q_i = q_1+q_2 = 1/2+0 = 1/2 < 1$$ as we are missing $q_3$ in the sum.