I've recently begun to study the basics of Information Theory using this pdf: https://web.stanford.edu/~montanar/RESEARCH/BOOK/partA.pdf, and have just read about Kullback-Leiber Divergence; there Gibbs' inequality is proven (although it isn't called by that name) using the convexity of the function: $-\log_2 x$ and Jensen's Inequality; some other proofs use the same idea with other convex functions like $\ x\log_2 x$ .
All well so far except for one thing: The functions that are used in the proofs are not $-\log_2 x$ or $\ x\log_2 x$ , but their composition with other functions like: $-\log_2 \left(\frac{q(x)}{p(x)}\right)$ or $\ \left(\frac{q(x)}{p(x)}\right)\log_2 \left(\frac{q(x)}{p(x)}\right)$ with $q(x)$ and $p(x)$ being probability distributions with $x \in \mathcal I$ such that $\ \forall x \in \mathcal I, \ q(x),p(x) \neq 0$ and $\sum \limits_{x \in \mathcal I} q(x) = 1$ and $\sum \limits_{x \in \mathcal I} p(x) = 1$.
The thing is: How do i know if the new composite functions are convex and so Jensen's Inequality applies? depending on $q(x)$ and $p(x)$, $f(x)=q(x)/p(x)$ may or may not be convex, and may or may not be non-decreasing; and i know examples of compositions of convex functions that are not convex.
I understood another proof that didn't use Jensen's Inequality at all, it's this kind of proofs that i have trouble with.
All help would be appreciated, regarding the question and also regarding style (i'm new so...)
Thank you all.
$$\mathbb{E}\left[-\log \frac{p(X)}{q(X)}\right]=\mathbb{E}\left[\log \frac{q(X)}{p(X)}\right]=D_{KL}(q||p) \ge -\log\mathbb{E}_q\left(\frac{p(X)}{q(X)}\right)=-\log\mathbb{E}_p(1)=0$$
and thus, Gibbs' inequality is proven.