I am self-learning probability theory from William Feller's Introduction to probability theory and it's applications. I am having some difficulty in understanding an elementary proof regarding the the law of large numbers. I reproduce the proof in the book interspersed with questions. Any help or inputs would be extremely insightful.
On page 141, the author proves, that there is a nice upper bound on the probabilities of the right and left tails of a binomial distribution:
If $r \ge np$, the probability of atleast $r$ successes in $n$ Bernoulli trials has an upper bound given by the inequality,
$$P(S_n \ge r) \le b(r;n,p) \cdot \frac{1}{1-\frac{(n-r)p}{(r+1)q}} \tag{1}$$
If $s \le np$, the probability of atmost $s$ successes in $n$ Bernoulli trials has an upper bound given by the inequality,
$$P(S_n \le s) \le b(s;n,p) \cdot \frac{1}{1-\frac{sq}{(n+1-s)p}} \tag{2}$$
I am able to derive the above inequalities and the results are clear to me. I also know that $b(k;n,p)$ is monotonically increasing for $k < (n+1)p$, then monotonically decreases for $k > (n+1)p$.
The Law of Large Numbers.
Our intuitive notion of probability is based on the following assumption. If $S_n$ is the number of successes in $n$ Bernoulli trials, then $S_n/n$ is the average number of successes and should be near $p$. We can now prove this mathematically.
Consider, for example, the probability that $S_n/n$ exceeds $p + \epsilon$. This probability is the same as $P\{S_n > n(p + \epsilon)\}$ and equals the left side of $(1)$, when $r$ is the smallest integer exceeding $n(p+\epsilon)$. Then, inequality $(1)$ implies:
\begin{align*} P(S_n > n(p+\epsilon)) < b(r;n,p) \cdot \frac{n(p+\epsilon) + q}{n\epsilon + q} \tag{3} \end{align*}
How to derive the expression on the right?
For example, I tried using the fact that $r > n(p + \epsilon)$ as follows:
\begin{align*} \frac{(n-r)p}{(r+1)q} < \frac{(n - n(p + \epsilon))p}{(n(p+\epsilon)+1)q} \end{align*}
But, it seems to get me nowhere!
With increasing $n$, the fraction on the right remains bounded, whereas $b(r;n,p) \to 0$ since $b(r;n,p) < b(k;n,p)$ for each $k$ such that $(n+1)p \le k < r$, and there are about $n\epsilon$ such terms $b(k;n,p)$.
Okay, I lost him, when he mentions there are $n\epsilon$ such terms $b(k;n,p)$. Why should there be $n\epsilon$ such terms?
It follows that as $n$ increases, $P\{S_n > n(p + \epsilon)\} \to 0$. Using inequality $(2)$, we see in the same way that $P\{S_n < n(p-\epsilon)\} \to 0$ and we have thus
\begin{align*} P\left\{ \left\lvert\frac{S_n}{n} -p \right\rvert < \epsilon \right\} \to 1 \tag{4} \end{align*}
In words: As $n$ increases, the probability that the average number of successes deviates from $p$ by more than any pre-assigned $\epsilon$ tends to zero.