Entropy is defined as $E(-\log(P(x))$. We know it is bounded by $\log(r)$ when $r$ is the size of alphabet.
Defining the second moment as $E(\log^2(P(x))$, how to show it is bounded?
Entropy is defined as $E(-\log(P(x))$. We know it is bounded by $\log(r)$ when $r$ is the size of alphabet.
Defining the second moment as $E(\log^2(P(x))$, how to show it is bounded?
On
Because you asked for some solution using Jensen's inequality (is this homework?), here is my attempt:
$$E(\log^2p(X))=\sum_{i=1}^rp_i\log^2\frac{1}{p_i}$$
Let's use natural logarithms. Now, $\log^2(y)$ is concave in the range $y \ge e$. Let's divide the full support in two sets of "high and low probabilities", $R_L$ and $R_H$, such that $i\in R_L $ iff $p_i^{-1} \ge e$ , or $p_i \le e^{-1} \approx 0.368\dots$. Let $|R_L|=r_L$, $|R_H|=r_H$, $r_H+r_L=r$, $P_L = \sum_{i\in R_L} p_i $, etc.
Then we can apply Jensen to $R_L$
$$ \sum_{i \in R_L} p_i \log^2\frac{1}{p_i} = P_L \sum_{i \in R_L} \frac{p_i}{P_L} \log^2\frac{1}{p_i} \le P_L \log^2 \left(\sum_{i \in R_L} \frac{p_i}{P_L}\frac{1}{p_i} \right) = P_L \log^2 r_L/P_L$$
To bound the other term, we note that in $e^{-1} \le x\le 1$ :$$x \log^2 x \le \frac{1-x}{e-1}$$
Then
$$\sum_{i \in R_H} p_i \log^2 p_i\le \frac{r_H}{e-1}\left(1- \frac{P_H}{r_H}\right)\le \frac{r_H}{e-1} (1- e^{-1})=\frac{r_H}{e}$$
Hence $$E(\log^2p(X)) \le \frac{r_H}{e} + P_L\log^2 \frac{r- r_H}{P_L} $$
This could be further worked on, considering that $r_H \in \{0,1,2\}$ (so $r_H/e < 1$), and that $P_L \le 1- r_H e^{-1}$, but it gets rather messy.
The function $u:p\mapsto p\log^2p$ is nonnegative and bounded by $u^*=4/\mathrm e^2$ on $[0,1]$ hence a first easy upper bound is that, for every random variable $X$ taking at most $r$ values, $$ E(\log^2p(X))=\sum_{i=1}^rp_i\log^2p_i\leqslant ru^*=4r/\mathrm e^2. $$ To refine this, one can follow the proof of the classical entropy bound, since one tries to maximize $$ \sum_{i=1}^ru(p_i)\ \text{under the constraints}\ p_i\geqslant0,\ \sum_{i=1}^rp_i=1. $$ Using Lagrange multipliers method, one sees that the optimal $(p_i)$ are such that $u'(p_i)$ does not depend on $i$. Since $u':p\mapsto\log^2p+2\log p$ is decreasing from $u'(0)=+\infty$ to $u'(1/\mathrm e^2)=0$ then decreasing to $u'(1/\mathrm e)=-1$ then increasing to $u'(1)=0$, $p_i$ can take at most two values.
The one value case is clear since then $p_i=1/r$ for every $i$. In the two values case, both values are in the interval where $u'\leqslant0$, that is, in $[1/\mathrm e^2,1]$, and at least once $p_i\geqslant1/\mathrm e$ hence the sum of the $p_i$ is at least $(r-1)/\mathrm e^2+1/\mathrm e$, which is $\gt1$ for every $r\geqslant6$.
All this shows that, for $r\geqslant6$, the optimal distribution is such that $p_i$ does not depend on $i$. Thus, $p_i=1/r$ for every $i$. Thus, for $r\geqslant6$, the optimal uniform upper bound (reached by the uniform distribution) is $$ E(\log^2p(X))\leqslant\log^2r. $$ The cases $2\leqslant r\leqslant5$ can be solved directly. For example, when $r=2$, the maximal value is reached when $p_1$ and $p_2$ are $\frac12(1\pm\sqrt{1-4/\mathrm e^2})$, hence, numerically, when $r=2$, $$E(\log^2p(X))\leqslant0.56288.$$ Note that this best uniform upper bound is $\gt\log^22$.
When $3\leqslant r\leqslant5$, the maximal value is reached when $p_i=1/r$ for every $i$, hence the upper bound $\log^2r$ holds. Finally, the only exceptional case is $r=2$.