I want to prove the following theorem
$$\lim_{N\to\infty}\frac{\displaystyle\sum_{k=1}^N f(a_k)}{N}=\sum_{n=1}^\infty f(n)P(a_k=n)$$
Where $a_k$ is a sequence of non-negative integers and $P(a_k=n)$ is the probability that $a_k=n$ over all $a_k$'s. $f(n)$ is an arbitrary function to the non negative real numbers.
I want to know what a sufficient (maybe even necessary) condition is that this theorem holds. Here is what I was able to prove so far
$$\lim_{N\to\infty}\frac{\displaystyle\sum_{k=1}^N f(a_k)}{N}=\lim_{N\to\infty}\frac{1}{N}\sum_{n=1}^\infty f(n)||a_k=n||_{k \le N}$$ where $||a_k=n||_{k \le N}$ is the number of $a_k$'s with $k \le N$ for which $a_k=n$. This is true since all that was done here is groupe together equal terms. Lets now assume the sum and the limit can be switched around (here is where I miss a proof), therefore we have $$\sum_{n=1}^\infty f(n)\lim_{N\to\infty}\frac{||a_k=n||_{k \le N}}{N}=\sum_{n=1}^\infty f(n)P(a_k=n)$$
I tried proving the missing step using Tannery's theorem but couldnt do it. So my question basically becomes, what conditions do $a_k$ and $f(n)$ have to meet in order to be able to switch the limit and sum? Any help is appreciated and thank you in advance!
This looks like an example of the more general ergodic theorem, which says that if $X$ is a probability space and $T: X \to X$ is "ergodic", surjective, measurable and measure-preserving, and $f : X \to \mathbb{R}$ is any measurable function, then
$$\lim_{N\to\infty} \frac{1}{N} \sum_{n=1}^N f(T^n(x)) = \int_X f(x)$$
for almost all $x \in X$.
In this specific case, let $P$ be a probability measure on $\mathbb{R}$; we may endow $\Omega = \mathbb{R}^\infty$ with a probability measure by assuming that $\omega = (\omega_k)_{k=1}^\infty$ is a sequence of independent reals, each drawn randomly according to $P$ on $\mathbb{R}$. The "shift map" $\sigma : \Omega \to \Omega$ given by $\sigma(\omega)_k = \omega_{k+1}$ satisfies the hypotheses of the ergodic theorem.
Now suppose $f : \Omega \to \mathbb{R}$ is any function for which $f(\omega)$ depends only on the first coordinate of $\omega$; i.e., we may write $f(\omega) = \hat{f}(\omega_1)$ for some measurable $\hat{f} : \mathbb{R} \to \mathbb{R}$. Then $f(\sigma^n(\omega)) = \hat{f}(\omega_n)$ for every $n$. Therefore,
$$\lim_{N\to\infty}\frac{1}{N}\sum_{n=1}^N \hat{f}(\omega_n) = \lim_{N\to\infty}\frac{1}{N}\sum_{n=1}^N {f}(\sigma^n(\omega)) = \int_\Omega f(\omega)$$
for almost all $\omega \in \Omega$. Finally, if $f(\omega)$ depends only on $\omega_1$, and if $P$ is discrete in the sense that $P(\omega_1 \in \mathbb{N}) = 1$, then we have
$$\int_\Omega f(\omega) = \sum_{n=1}^\infty \hat{f}(n) P(\omega_1 = n).$$
The conclusion is therefore that
$$\lim_{N\to\infty}\frac{1}{N}\sum_{n=1}^N \hat{f}(\omega_n) = \sum_{n=1}^\infty \hat{f}(n) P(\omega_1 = n)$$
for almost every $\omega = (\omega_n)_n \in \mathbb{R}^\infty$.
Edit: to say more about completing the argument that you outlined, you would first need to know (as observed by the comments) that the sum on the LHS converged in the first place, so that rearranging its terms preserves the limit. The ergodic theorem allows you to claim that this is indeed the case, for almost all $(a_k)_k \in \mathbb{N}^\infty$.
Secondly, you can distribute the $1/N$ into your sum and argue that
$$\lim_{N\to\infty}\frac{|\{k \leq N : a_k = N\}|}{N} = P(a_1 = n)$$
which may be seen by the strong law of large numbers. Indeed, if $X_k^{(n)}$ is the random variable taking the value $1$ if $a_k = n$ and $0$ otherwise, then $X_1^{(n)} + \cdots + X_N^{(n)}$ counts the number of times the sample $n$ is observed in the first $N$ positions of the sequence $(a_k)_k$, and by the strong law of large numbers we should see that
$$\lim_{N\to\infty} \frac{X_1^{(n)} + \cdots + X_N^{(n)}}{N} = P(a_1 = n)$$
holds with probability $1$. Finally, you would want to argue that you can exchange the limits in $n$ (the summation variable) and $N$. This might be accomplished by considering a sequence of probability measures "approaching" $P$, say for instance $P^\ell(a_1 = n) = P(a_1 = n)$ if $n < \ell$, $P^\ell(a_1 = \ell) = 1 - P(a_1 \geq \ell)$, and $P^\ell(a_1 > \ell) = 0$.
The strong law of large numbers itself is connected to the ergodic theorem through the shift map argument which I outline above, so these are all different ways of saying the same things.