I am strugging to understand some parts of the proof of the upper bound in Cramér's theorem under the assumption (2.14)

Consider also:
The proof starts like this
Let our random variables $\{X_k\}$ be defined on a probability space $(\Omega; \mathscr F; P )$. On any open set where $M(\theta)$ is finite it is differentiable and $\nabla M(\theta) = E[Xe^{\theta\cdot X}]$. This is by dominated convergence. Thus $\theta \cdot x − \log M(\theta)$ is a concave differentiable function of $\theta$ that, by $(2.14)$,achieves its maximum $I(x)$ at some $\theta_x$ .Then $\nabla M(\theta_x) = xM(\theta_x)$.
Define the probability measure $\nu_x $on $\mathbb R^d$ by
$$\nu_x(B) = \frac1{M(\theta_x)} E[e^{\theta_x\cdot X}1_{X \in B}]; B \in \mathscr B_{\mathbb R ^d}$$
The mean of $\nu_x$ is
$$\int_{\mathbb R^d} y \nu_x(dy) = \frac{E[Xe^{\theta_x\cdot X}]}{M(\theta_x)}\tag {*}$$
$$= \frac{\nabla M(\theta_x)}{M(\theta_x)}= x$$
(...)
My questions are:
1) How does it follow from (2.14) that the maximum $I(x)$ is achieved ?
2) What do they mean by the "mean" of $\nu_x$? The mean of a random variable OK, but what on Earth is the mean of a measure???
3) How is (*) computed?
