Strange version of Cramér's Theorem: Why is it necessary that the supremum is obtained at some interior point of the neighborhood?

230 Views Asked by At

Recently, I dealt with the Cramér theorem, see here, Theorem 1 on page 1.

In this version of the theorem, it is needed that

  • (i) the moment-generating function $M$ is finite on a neighborhood $B_0$ of $0$ and, additionally, that
  • (ii) the supremum in the definition of the rate function I(x) is obtained at some interior point in the neighboorhood $B_0$.

In books and other sources of the internet, the theorem is usually only formulated with (i), that is, it is only needed that there is such a neighborhhod $B_0$ on which the moment generating function is finite, for example here on page 40, Theorem 4.1.5 and (4.2).

So I am really (!) wondering, why (ii) is supposed/ needed in the first linked version since this makes things more difficult:

If I have found a neighborhood that fulfills (i) then up to this version, I have to show additionally that the supremum in the definition of the rate function $I$ is obtained in an interior point of this found neighborhood. Whereas up to the second linked version of the theorem, I can apply the theorem without checking this.

Do you see, if (ii) is in fact needed or that it maybe follows anyway when having found a neighborhood fulfilling (i)?

1

There are 1 best solutions below

72
On BEST ANSWER

Most of this answer is in the comments.

The most general LDP for the iid variable case says this:

$$-\inf_{x \in Int(A)} I(x) \leq \liminf \frac{1}{n} \log P(S_n \in A) \leq \limsup \frac{1}{n} \log P(S_n \in A) \leq -\inf_{x \in Cl(A)} I(x)$$

whenever $A$ is Borel. Here $Int$ denotes interior, $Cl$ denotes closure, and $I$ is the rate function. When applied to tails of such averages, you consider $A=[y,\infty)$.

Now assume the $X_i$ have mean $\mu$ and $y \geq \mu$. Then $I$ is monotone increasing on $[y,\infty)$. Therefore the infimum over $[y,\infty)$ is $I(y)$ and the infimum over $(y,\infty)$ is $\lim_{x \to y^+} I(x)$. In the case where $I$ is right continuous at $y$, these are the same. In this case, you get the nice version of the LDP*:

$$\lim \frac{1}{n} \log P(S_n \geq y) = -I(y).$$

But when $I$ is not right continuous at $y$, we become forced to fall back on the form of the LDP that I stated initially.

One way that this can occur is if $X_i$ are unbiased Bernoulli variables, in which case $I(x)=\sup_{\theta \in \mathbb{R}} \theta x - \log(1+e^\theta) + \log(2)$. This has a relative extremum when $x-\frac{e^\theta}{1+e^\theta}=0$. But this ratio of exponentials can only be in $(0,1)$. Consequently if $x \geq 1$, then the supremum is not attained at any finite value. For $x>1$ the supremum is infinite, essentially because $\theta x - \log(1+e^\theta)$ grows linearly at infinity (since $\frac{e^\theta}{1+e^\theta} \to 1$). But exactly at $x=1$, the supremum is actually finite, and is equal to $\log(2)$. Thus $I$ is not right-continuous at $1$.

In this example, since $I(1)$ is finite, the upper bound is still useful; we get

$$\limsup \frac{1}{n} P(S_n \geq 1) \leq -\log(2)$$

which is actually exact, as we know, but the LDP cannot detect this. The lower bound becomes trivial:

$$\liminf \frac{1}{n} P(S_n > 1) \geq -\infty$$

which is completely useless (but is, in fact, exact again, as we know). The LDP suffers from these types of "boundary effects" in many situations. For example, this is one of the subtle difficulties in properly describing how the solution to an SDE with small noise stays "close" to the trajectory whose Freidlin-Wentzell action is minimal, even though the action of the trajectories which are actually chosen is infinite with probability 1.

The hypotheses in your link provide a way to avoid these technical issues that arise in the general case, at the cost of some generality. You can prove that it avoids these issues by using the fact that $I$ is always continuous on $Int(D)$ where $D=\{ x : I(x) < \infty \}$. (Cf. den Hollander p. 8)

* This might not work when $P(S_n=y)$ remains bounded away from zero, such as in the boring case where $X_i=y$ a.s. But this is the generic situation.