Cramer's Theorem States,
Let $(Y_i)_{i\geq 1}$ be a sequence of i.i.d. random variables, ${S_n=\frac{1}{n}\sum_{i=1}^n Y_i}$ be their average sum and $M_{Y_1}(u):=\mathrm{E}[e^{uY_1}]<\infty$ be the moment generating function of the r.v. . Then, for all $t>\mathrm{E}[Y_1]$
\begin{equation} \lim_{n\rightarrow \infty}\frac{1}{n}\ln P(S_n\geq t)=-I(t)\quad\quad\quad \end{equation}
where the rate function $I$ is defined by
\begin{equation*} I(t):=\sup_{u}\left(tu-\ln M_{Y_1}(u)\right) \end{equation*}
I am not quite sure what this means, I would guess it means $\lim_{n\rightarrow \infty}P(S_n\geq t)$ converges at rate $I(t)$? Am I on the right lines?
It means for large sample sizes, $P(S_n \geq t) \approx e^{-n I(t)}$, where the $\approx$ ignores sub exponential factors in $n$ (e.g. the coefficient may be on the order of a polynomial or something.
That is, the probability of the sample mean exceeding threshold $t$ decays essentially exponentially at a rate $I(t)$.
If you want to have some numerical feel for this, you can do something like the following experiment:
Assume you have $n$ samples from a $N(0,1)$ distribution. We want to see what $P(S_n \geq t)$ looks like.
The exact probability of the sample mean exceeding $t$ is $P(S_n \geq t) = Q(t \sqrt{n})$, where $Q(a) = \int_a^\infty \frac{1}{\sqrt{2 \pi}} e^{-x^2/2}$.
Let $Q_{LB}(a) =\frac{a}{1+a^2} \left( \frac{1}{\sqrt{2 \pi}} e^{-a^2/2} \right)$ and $Q_{UB}(a)= \frac{1}{a \sqrt{2 \pi}} e^{-a^2/2}$. For $a>0$, $Q(a)$ satisfies the bounds $ Q_{LB}(a) \leq Q(a) \leq Q _{UB}(a)$. By the form of $Q_{UB},Q_{LB}$, you see $Q(a) = (1+o(1)) Q_{LB}(a)$ and $Q(a) = (1+o(1)) Q_{UB}(a)$ as $a \to \infty$. So, these bounds are pretty tight.
You can calculate $I(t) = \frac{t^2}{2}$ from the definition.
Now, make a plot of $n$ versus $-n I(t)$, $\log Q_{LB}(t \sqrt{n})$ and $\log Q_{LB}(t \sqrt{n})$ [for the last 2, take the logarithms in the definitions out by hand, then calculate it; if you try to calculate $Q_{LB}$ or $Q_{UB}$ then take the log, you'll run out of numerical precision]. These are all estimates of a plot of $n$ versus $\log P(S_n \geq t)$ (unfortunately, you're going to run out of precision probably if you try to calculate $Q(t \sqrt{n})$ exactly, so we're going to use the upper and lower bounds instead).
You'll see that $-n I(t)$ (the estimate given by Cramer's theorem) will be quite close to the upper and lower bounds in the sense of having the same slope. There might be a slight offset between the bounds and what Cramer's theorem predicts, but thats just due to the sub-exponential factor ignored by Cramer's theorem.
Here's some mathematica code as an example:
You can switch the 1,100 in the plot to get different ranges of $n$.
You see that the log error probability is stupidly close to what Cramer's theorem estimates (for sufficiently large $n$):
So, you see that the exact log-error probabilities have the same slope with $n$ as the rate function. Theres a bit of offset, but thats due to sub-exponential factors:
Cramer's theorem predicts $\log P(S_n \geq t) \approx -n I(t)$.
The exact expression is $\log P(S_n \geq t) = -n I(t) - \frac{1}{2} \log (n) (1+o(1)) - \log t $; for $n$ sufficiently large, the first term is all that matters to determine how $\log P(S_n \geq t)$ scales with $n$. The terms other than $-n I(t)$ provide the offset in the plot of $-n I(t)$ versus n as compared to the exact log probability vs $n$.
Except in the Gaussian and a few other cases (e.g. $\alpha$-stable distributions), you won't have a nice expression for the sample mean exceeding a threshold, so the estimate of Cramer's theorem gives you the right idea of how it behaves (as $e^{-n I(t)}$). I chose a Gaussian in the example above since its easy to evaluate; if you try to do it via simulations, you will need to use importance sampling + a lot of simulations.