I am reading A Course on Large Deviations with an Introduction to Gibbs Measures by Firas Rassoul-Agha and Timp Seppelainen. On section 1.1 they apply the noiseless coding theorem to find a lower bound on the average length of a uniquely decipherable code. It's been a long time since I took probability so I might be a bit rusty, I am having trouble with a couple of statements at the end. Below you can see a picture with the equations in context . My questions are
$\text{ "With probability s for symbol 1, the typical word of length $n$ has about $ns$ ones }(1) $
Suppose we use code words of length $L$ to code these typical words. Then $$ 2^L \ge{ {n}\choose{\lfloor ns \rfloor}\tag{2}}$$ Where do (1) and (2) come from?



The formulation is rather sloppy. I guess that "typical" here is not related with "typical sets". Informally:
Consider a random sequence of $n$ iid $\{0,1\}$ symbols, with $P(X=1)=s$. Then, with high probability, the number of ones is near $n s$.
This comes from the law of large numbers.
More precisely, letting $X$ be the number of ones, we have that for any fixed $\epsilon >0$, as $n \to \infty$
$$P( | X/n - s| > \epsilon) \to 0$$
A little stronger: $X$ has mean $n s$ and variance $n s (1-s)$. Hence, using Chebyshev, $$P( |X - ns| > \alpha \sqrt{n}) \le \frac{s(1-s)}{\alpha^2}$$
so "nearly always" the number of ones falls in the range $ n s \pm \alpha \sqrt{n}$ for some $\alpha$.
For the second one:
Assuming $k=ns$ is integer, you have $\binom{n}{k}$ sequences with $k$ ones. With $L$ bits, you can produce $2^L$ different labels, hence you can code at most $2^L$ values.