In Steven Schreve's Stochastic Calculus for Finance book on page 30, Schreve proves the Jensen's inequality. However, I don't quite understand all of the steps in the proof.
(P.S. Alongside, this book, I am reading Feller, so it's my first serious exposure to probability).
If $\phi(x)$ is a convex function in the dummy variable $x$, then
$$\mathbb{E}(\phi(X))\ge \phi(\mathbb{E}(X))$$
Proof.
We first argue that a convex function is the maximum of all linear functions that lie below it. That is, for all $x \in \mathbb{R}$,
$$\phi(x) =\max\{l(x)| l \text{ is linear and }l(y)\le \phi(y),\forall y\in \mathbb{R}\}$$
(We first prove that $\phi(x)$ is the upper bound for any linear function below it. That is amply clear to me.)
Since, we are considering only linear functions $l(x)$ that lie below $\phi$, it is clear that:
$$\phi(x) \ge \max\{l(x)| l \text{ is linear and }l(y)\le \phi(y),\forall y\in \mathbb{R}\}$$
On the other hand, let $x$ be an arbitrary point in $\mathbb{R}$. Because $\phi$ is convex, there is always a linear function that lies below $\phi$ for which $\phi(x)=l(x)$ for this particular $x$. This is called the support line of $\phi$ at $x$.
Therefore,
$$\phi(x) \le \max\{l(x)| l \text{ is linear and }l(y)\le \phi(y),\forall y\in \mathbb{R}\}$$
This establishes the equality (1).
How does the less than $<$ sign come about? Why would $\phi(x)$ is less than the maximum of all linear functions lying below $\phi(y)$ for all $y \in \mathbb{R}$?