Proving Jensen's inequality for the general case starting from the finite case

Question

Proving Jensen's inequality for the general case starting from the finite case

703 Views Asked by Bumbble Comm At 26 Mar 2026 - 8:40

For a finite convex combination, Jensen’s inequality is given by: $$f\left(\sum_i^n a_i x_i\right) \leq \sum_i^n a_i f(x_i)$$ for a convex $f$. Proving this is not so bad starting from the definition of a convex function (iteratively break the convex combination into a convex combination between just two points and apply the definition of a convex function).

For the case of general probability measures: $$f(\mathbb E X) \leq \mathbb E f(X)$$ where $X$ is a real valued random variable, the approach I know is pretty different (show that a tight affine lower bound of $f$ exists then apply linearity of expectation) but I feel that the method for the finite case could be made to work here.

Here is my attempt, drawing inspiration from the construction of the Lebesgue integral:

Given a real valued random variable $X$, let $X^+,X^-$ be such that $X = X^+ - X^-$ so that it suffices to show the claim for non-negative random variables (apply the definition of convexity to the convex combination $\mathbb P [X>0] X^+ + \left(1-\mathbb P[X>0]\right) X^-$). Let $X_j$ be a sequence of simple random variables (defined as a finite sum of indicator functions on the sample space) dominated by $X$ such that $$\mathbb E X_j \rightarrow \sup \left\{\mathbb E Y: Y \leq X,\, Y\text{ is simple}\right\}$$

Now we have: $$f\left(\mathbb E X\right) = f\left(\mathbb E X_j + o(1)\right) = f\left(\mathbb E X_n\right) + o(1) \leq \mathbb E f\left( X_j\right) + o(1)$$

Where the first equality is by definition of the sequence $X_j$, the second equality is from the fact that $f$ is convex and so continuous on the interior of its domain, the inequality is from the finite case of Jensen's inequality.

Next, I would like to claim that $X_j \rightarrow ^p X$ and apply the continuous mapping theorem to complete the argument, but I'm not sure how to justify that claim.

I could use feedback on my reasoning as well as some help justifying the last claim.

EDIT: there is a simpler argument. Instead of defining $X_j$ as I have, I could instead define it as a sequence of simple random variables increasing towards $X$ pointwise (for every event in the sample space). I had not realize that we could define such a sequence in general, I am not sure how to show that this is true. In that case we have:

$$\mathbb E f\left( X_j\right) = \mathbb E f\left( X\right) + o(1)$$

by the continuous mapping theorem.

Any feedback on this followup approach would be very appreciated.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 13 Mar 2021 - 3:10

In the usual approximation of $X\ge 0$ by simple $X_j\ge 0$, gotten by successively finer partitions of the range of $X$ into dyadic intervals, one has $X_j$ increasing pointwise to $X$. You then have $\lim_j \Bbb EX_j = \Bbb E X$, and $\lim_j f(X_j)=f(X)$ by the continuity of $f$. But to claim that $\lim_j\Bbb Ef(X_j) =\Bbb Ef(X)$ would seem to require something like dominated convergence (in the absence of any monotonicity assumption about $f$).

The convexity of $f$ does imply that if the right-hand derivative $f'_+(0)$ is larger than $-\infty$ then there are constants $a,b$ such that $g(x):=f(x)+a+bx\ge 0$ for all $x$, and such that $g$ is increasing. You can now use the monotone convergence theorem (applied to $g$) to conclude that $\Bbb Ef(X)=\lim_j\Bbb Ef(X_j)$.

But what about, for example, $f(x) = -\sqrt{x}$, $x\ge 0$?

**Bumbble Comm** · Accepted Answer

I think your argument goes through fine if you choose the $X_j$ in a nice way. Don't just take any $X_j$, take $X_j$ nonnegative, simple, and increasing to $X$ with $X_j \to X$ a.s. (and hence in probability as well, but the need for the continuous mapping theorem goes away since monotone convergence applies.) The existence of such an approximating sequence is usually part of the process of defining the Lebesgue integral. Split $\mathbb{R}$ into multiples of $2^{-k}$ from $0$ to $k$ and take $X_k$ to be the largest multiple of $2^{-k}$ that is less than $X$ and $k$. Then $0 \leq X_k \leq \min(X, k)$ and $X_k$ increases to $X$ pointwise.

I would like to note, however, that proving the general case of Jensen directly is somewhat simpler than your extension of the finite case. For any line $L \leq f$, one has $L(E[X]) = E[L(X)] \leq E[f(X)]$, now take the sup over lines (with rational slope and intercept, say, so the sup is countable) to get $f(E[X]) \leq E[f(X)]$.

Proving Jensen's inequality for the general case starting from the finite case

There are 2 best solutions below

Related Questions in PROBABILITY

Related Questions in MEASURE-THEORY

Related Questions in SOLUTION-VERIFICATION

Related Questions in LEBESGUE-INTEGRAL

Related Questions in JENSEN-INEQUALITY

Trending Questions

Popular # Hahtags

Popular Questions