I'm reading alternative proof of Gibbs' inequality written in wikipedia, which states that
Suppose that $P=\{p_1,...,p_n\}$ be probability distribution.
Then for any other probability distribution $Q=\{q_1,...,q_n\}$, $-\sum p_i\log{p_i} \leq -\sum p_i\log{q_i}$
I have two question.
First, the proof uses Jensen's inequality to claim that $\sum p_i \log \frac{q_i}{p_i} \leq \log \sum p_i \frac{q_i}{p_i}$.
But why does it hold? I think Jensen's inequality just says that $\sum \log{(p_i \frac{q_i}{p_i})} \leq \log \sum p_i \frac{q_i}{p_i}$
Second, can we apply Gibbs' inequality in the continuous case? i.e., does $-\int f(x)\log{f(x)} dx \leq -\int f(x)\log{g(x)} dx$ still holds for probability density function $f, g$?
I think since we can apply Jensen's inequality in the continuous case, we can still argue that the continuous case holds. Nevertheless, I cannot find any mention of Gibbs' inequality in the continuous case; they only deal with the discrete case. Is there any problem to deal with the continuous case? Or can I use Gibbs' inequality in the continuous case as I write?
Thanks.
Partial answer for the first part using "elementary stuff" that OP has asked for in comments:
In the quoted inequality $$\sum p_i \log \frac{q_i}{p_i} \leq \log \sum p_i \frac{q_i}{p_i},$$
Note that $\log$ is concave, so Jensen's inequality would give the desired result.
The second part is addressed in the linked blog article, and I'm not going retype his/her arguments.