I am working through the proof of Sanov's Theorem for the Empirical Measure (as found in Hollander's Large Deviations text), and am hoping someone could provide a bit of clarity on some of the steps taken...
The theorem is as follows:
From what I gather, it applies a technique similar to when Hollander proved the rate function for the Bernoulli case, where we find a suitable bound on the probability of the large deviation, and then applying some asymptotic (e.g. Stirling's Approximation) we manage to obtain the result we need.
But I do have a couple of questions regarding the start of this proof (pictured below):

- I understand we define this set of $r$-tuplets which are normalised to sum to $n$. What exactly is meant by the line $\frac{1}{n}K_n \subset \mathscr{M}_1(\Gamma)$? I don't quite understand how we can say this set of $r$-tuplets can be a probability measure on the finite set?
- How do we deduce that the Empirical Measure itself has a multinomial distribution, $L_n \sim \text{Multi}(n, r, \rho)$? It looks to me like the set $K_n$ would be of the right form to be a support of the multinomial distribution, but I feel like I am missing something obvious to make the conclusion.
Any clarity that can be provided would be greatly appreciated. I understand the steps taken to obtain the result after this step, but I don't want to move on to further results (i.e. pair empirical measures) without properly understanding how this proof works...

Regarding question 1, what the author wants to say is that when you divide each $r$-tuple in the set $K_n$ by $n$, this vector can be identified with a probability measure on $\Gamma=\{1,\dots,r\}$ in a natural way. For example, $n=5,r=3$, then $k=(2,1,2)\in K_n$. Note that $\frac{1}{5} k = (2/5,1/5,2/5) $, so we may think of the first coordinate being the probability of 1 occurring, second coordinate being the probability of 2 occurring and third coordinate being the probability of 3 occurring. Thus, the vector $\frac{1}{n} k \in \mathfrak{M}_1$ for all $k\in K_n$ or in other words $\frac{1}{n}K_n\subset \mathfrak{M}_1$.
Regarding your second question, notice that $L_n$ has to be a probability measure in $K_n$, since $L_n$ just records the frequency of occurrences in the i.i.d. sequence $X_1,X_2,\dots, X_n$. Now for the random distribution $L_n$ to be exactly equal to $\frac{1}{n}k$, for some fixed $k$, we need $L_n(s)=\frac{1}{n}k_s$. In other words: $$\frac{\sharp \{l\leq n:X_l=s\}}{n}=\frac{1}{n}k_s$$ Which is of course equivalent to $\sharp \{l\leq n:X_l=s\}=k_s$. So we need to make sure that $1$ occurs $k_1$ times in $X_1,\dots,X_n$, $2$ occurs $k_2$ times, and so on. Well one realization of our random sequence which would clearly satisfy this would be the following: $$ \underbrace{1,\dots,1}_{k_1 \text{ times}}, \underbrace{2,\dots,2}_{k_2 \text{ times}},\dots, \underbrace{r,\dots,r}_{k_r \text{ times}} $$ Since the sequence is i.i.d., the probability of this happening is clearly $$\prod_{s=1}^r \rho_s^{k_s} $$
But this is just one suitable realization. It is not hard to see that in fact every permutation of the sequence above would yield a valid sequence (and in fact this gives us all the valid sequences), and all of them have the same probability, so the final answer is given by counting all such permutations and that yields: $${n\choose{k_1,k_2,\dots,k_r}} \prod_{s=1}^r \rho_s^{k_s} $$ Rewriting the multinomial coefficient a bit gives you the desired form.