First, context: Let $X$ be a compact metric space, and $C(X)$ the set of continuous functions on $X$ equipped with the sup norm. Let $T:X\to X$ continuous. Fix $x \in X$. Then one can show that $S_f^n(x) =\displaystyle \frac1n\sum_{k=0}^{n-1}f(T^k(x))$ is a bounded linear functional on $C(X)$, which has a convergent subsequence for all $g\in C(X)$. Denote the limit to be $S_g^\infty(x)$. This gives rise to the linear functional $L_x(g) = S_g^\infty(x)$, which is positive, so by Riesz Representation Theorem, we have $L_x(g) = \int_X gd\mu$ for some Borel probability measure.
Now we are basically at my question. The author of the book I'm using (Dynamical Systems by Brin & Stuck) next shows that $S_g^\infty (x) = S_g^\infty (Tx)$ to conclude that $\mu$ is $T$-invariant. However, I don't know why this is.
If we could work with characteristic functions, we would have the following:
$\mu(A) = \int_X {\mathcal{X}}_A d\mu = L_x(\mathcal{X}_A)= L_x(\mathcal{X}_A\circ T) = \int_X \mathcal{X}_A \circ T d\mu = \mu(T^{-1}(A)).$
However, we've been working with continuous functions. It is not true, I don't think, that we can approximate simple functions by continuous functions in the sup norm. So I'm quite confused about how we get that $\mu$ is $T-$invariant. Moreover, I'm not sure I get why $T$ being continuous is necessary.
For more context, here is the statement of the the theorem.
Let $X$ be a compact metric space and $T: X → X$ a continuous map. Then there is a $T$-invariant Borel probability measure $\mu$ on $X$.
It follows from looking at the averages that $$S^\infty_g(Tx)=\int_X(g\circ T)\,d\mu.$$ So $$\int_X g\,d\mu=\int_X(g\circ T)\,d\mu$$ for all $g$ continuous. Since continuous functions are dense in $L^1(X,\mu)$ (or apply Lusin theorem and then Tietze theorem), you can simply replace $g$ by a characteristic function.