I am trying to understand why we need Large deviation theory/principle.
Here is what I understand so far based on the Wikipedia. Let $S_n$ be a random variable which depends on $n$. We are interested in
How the probability $P(S_n > x)$ changes as $n \to \infty$.
Very often, LDT concerns with how fast $P(S_n >x)$ converges to $0$. Then the answer to the above question is already known, it is $0$. Then one might be interested in how fast it converges. So I understand the notion of the ration function $I(x)$.
Here I what I don't understand:
- In order to know $P(S_n > x)$, it is a matter of finding the cumulative distribution function of $S_n$, as $$ P(S_n > x) = 1 - P(S_n < x) = 1 - F_{S_n}(x). $$ Then once we know the pdf of $S_n$, say $f_{S_n}(x)$, one can easily obtain $$ P(S_n > x) = \int_x^{\infty} f_{S_n}(t)dt. $$ If this is the case, I am not sure why we need LDP.
- Okay, it is not always the case where we know the pdf of $S_n$. If it is the case, I think the problem is then to estimate $P(S_n > x)$. However, it seems that many LDP approaches require to compute certain generating functions. For example, $$ \lambda(k) = \lim_{n\to \infty} \frac{1}{n} \ln E[e^{nkS_n}], \qquad I(s) = \sup_{k \in \mathbb{R}} \{ks - \lambda(k)\}. $$ It seems that calculating the aboves are definitely a lot harder than a direct calculation of $P(S_n > x)$. Thus I don't understand why people introduces quantities which a lot harder to obtain.
- "Why the rate?" Ok, regardless of #1 and #2, let say we have $I(x)$. What now? Given $I(x)$, can we estimate $P(S_n > x)$? or can we do something useful? Or Is the LDP devoted to derive the rate function $I(x)$?
Any comments or answers will very be appreciated.
Large deviation theory deals with the decay of probabilities of rare events on an exponential scale. If $(S_n)_{n \in \mathbb{N}}$ is a random walk, then "rare event" means that $\lim_{n \to \infty} \mathbb{P}(S_n \in A) =0$. Large deviation theory aims to determine the asymptotics of
$$\mathbb{P}(S_n \in A) \quad \text{as $n \to \infty$}$$
How fast does $\mathbb{P}(S_n \in A)$ tend to zero? Roughly speaking, a large deviation estimate tells us that
$$\mathbb{P}(S_n \in A) \approx \exp \left( -n J(A) \right) \qquad \text{for large $n$}$$
for a certain rate $J(A) \geq 0$. This means that the probability $\mathbb{P}(S_n \in A)$ decays for $n \to \infty$ exponentially with rate $J(A)$. If you are for instance interested in finding $n \in \mathbb{N}$ such that $\mathbb{P}(S_n \in A) \leq \epsilon$ for some given $\epsilon>0$, this is really useful because it allows you to determine how large you have to choose $n$.
You are right that it is, in general, not easy to determine the rate function. The case of independent identically distributed random variables illustrates quite nicely that large deviation theory has nevertheless its justification:
Let $(X_j)_{j \in \mathbb{N}}$ be a sequence of independent and identically distributed random variables and $S_n := \sum_{j=1}^n X_j$ the associated random walks. If we want to determine the asymptotics
$$\mathbb{P}(S_n > x)$$
using the distribution function, then this means that we have to calculate the distribution function of $S_n$ for each $n$, and this will certainly require a huge amount of computations. If we use large deviation theory instead, then we have to compute
$$I(x) = \sup_y \{y \cdot x - \lambda(y)\}$$
for the moment generating function $\lambda(y) = \mathbb{E}\exp(y X_1)$. Note that these quantities do not depend on $n$, i.e. we have to compute it once and then we are done. Moreover, the large deviation principle will also allow us to estimate probabilities of the form
$$\mathbb{P}(S_n \in A) $$
using the rate function $I$; we are not restricted to events $A$ of the particular form $(x,\infty)$. Probabilities like that are very hard to compute using the density function $S_n$ (which is itself, in general, very hard/impossible to compute).