What does Cramer's Theorem tell us?

Question

What does Cramer's Theorem tell us?

2.5k Views Asked by Bumbble Comm At 11 May 2026 - 4:13

Cramer's Theorem States,

Let $(Y_i)_{i\geq 1}$ be a sequence of i.i.d. random variables, ${S_n=\frac{1}{n}\sum_{i=1}^n Y_i}$ be their average sum and $M_{Y_1}(u):=\mathrm{E}[e^{uY_1}]<\infty$ be the moment generating function of the r.v. . Then, for all $t>\mathrm{E}[Y_1]$

\begin{equation} \lim_{n\rightarrow \infty}\frac{1}{n}\ln P(S_n\geq t)=-I(t)\quad\quad\quad \end{equation}

where the rate function $I$ is defined by

\begin{equation*} I(t):=\sup_{u}\left(tu-\ln M_{Y_1}(u)\right) \end{equation*}

I am not quite sure what this means, I would guess it means $\lim_{n\rightarrow \infty}P(S_n\geq t)$ converges at rate $I(t)$? Am I on the right lines?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

It means for large sample sizes, $P(S_n \geq t) \approx e^{-n I(t)}$, where the $\approx$ ignores sub exponential factors in $n$ (e.g. the coefficient may be on the order of a polynomial or something.

That is, the probability of the sample mean exceeding threshold $t$ decays essentially exponentially at a rate $I(t)$.

If you want to have some numerical feel for this, you can do something like the following experiment:

Assume you have $n$ samples from a $N(0,1)$ distribution. We want to see what $P(S_n \geq t)$ looks like.

The exact probability of the sample mean exceeding $t$ is $P(S_n \geq t) = Q(t \sqrt{n})$, where $Q(a) = \int_a^\infty \frac{1}{\sqrt{2 \pi}} e^{-x^2/2}$.

Let $Q_{LB}(a) =\frac{a}{1+a^2} \left( \frac{1}{\sqrt{2 \pi}} e^{-a^2/2} \right)$ and $Q_{UB}(a)= \frac{1}{a \sqrt{2 \pi}} e^{-a^2/2}$. For $a>0$, $Q(a)$ satisfies the bounds $ Q_{LB}(a) \leq Q(a) \leq Q _{UB}(a)$. By the form of $Q_{UB},Q_{LB}$, you see $Q(a) = (1+o(1)) Q_{LB}(a)$ and $Q(a) = (1+o(1)) Q_{UB}(a)$ as $a \to \infty$. So, these bounds are pretty tight.

You can calculate $I(t) = \frac{t^2}{2}$ from the definition.

Now, make a plot of $n$ versus $-n I(t)$, $\log Q_{LB}(t \sqrt{n})$ and $\log Q_{LB}(t \sqrt{n})$ [for the last 2, take the logarithms in the definitions out by hand, then calculate it; if you try to calculate $Q_{LB}$ or $Q_{UB}$ then take the log, you'll run out of numerical precision]. These are all estimates of a plot of $n$ versus $\log P(S_n \geq t)$ (unfortunately, you're going to run out of precision probably if you try to calculate $Q(t \sqrt{n})$ exactly, so we're going to use the upper and lower bounds instead).

You'll see that $-n I(t)$ (the estimate given by Cramer's theorem) will be quite close to the upper and lower bounds in the sense of having the same slope. There might be a slight offset between the bounds and what Cramer's theorem predicts, but thats just due to the sub-exponential factor ignored by Cramer's theorem.

Here's some mathematica code as an example:

t := 1
logQlb[a_] := -a^2/2 + Log[a] - Log[1 + a^2] - Log[Sqrt[2*\[Pi]]]
logQub[a_] := -a^2/2 - Log[a] - Log[Sqrt[2*\[Pi]]]
ratefn[t_] := t^2/2
Plot[{-n*ratefn[t], logQlb[t*Sqrt[n]], logQub[t*Sqrt[n]]}, {n, 1, 
  100}, PlotLegends -> {"Large Deviations (Cramer) estimate", 
   "Exact (lower bound) via Q fn", "Exact (upper bound) via Q fn"}, 
 AxesLabel -> {"n", "Log[P[Sn>=t]]"}]

You can switch the 1,100 in the plot to get different ranges of $n$.

You see that the log error probability is stupidly close to what Cramer's theorem estimates (for sufficiently large $n$):

So, you see that the exact log-error probabilities have the same slope with $n$ as the rate function. Theres a bit of offset, but thats due to sub-exponential factors:

Cramer's theorem predicts $\log P(S_n \geq t) \approx -n I(t)$.

The exact expression is $\log P(S_n \geq t) = -n I(t) - \frac{1}{2} \log (n) (1+o(1)) - \log t $; for $n$ sufficiently large, the first term is all that matters to determine how $\log P(S_n \geq t)$ scales with $n$. The terms other than $-n I(t)$ provide the offset in the plot of $-n I(t)$ versus n as compared to the exact log probability vs $n$.

Except in the Gaussian and a few other cases (e.g. $\alpha$-stable distributions), you won't have a nice expression for the sample mean exceeding a threshold, so the estimate of Cramer's theorem gives you the right idea of how it behaves (as $e^{-n I(t)}$). I chose a Gaussian in the example above since its easy to evaluate; if you try to do it via simulations, you will need to use importance sampling + a lot of simulations.

What does Cramer's Theorem tell us?

There are 1 best solutions below

Related Questions in PROBABILITY-THEORY

Related Questions in LARGE-DEVIATION-THEORY

Trending Questions

Popular # Hahtags

Popular Questions