I am reading a paper, and have the following question:
Suppose we have the discrete-time dynamical system: $$x^+ = T(x)$$
- The mapping $T:\mathbb{R}^n \rightarrow \mathbb{R}^n$ is a multivariate polynomial.
- $\mu$, which is any nonnegative Borel measure, is an invariant measure: $$\mu(T^{-1}(A)) = \mu(A) \ \ \ \cdots (1)$$ for all Borel measurable $A\in \mathbb{R}^n$.
Next, the paper restricts attention to invariant measures with support included in some compact set $X\subset \mathbb{R}^n$. With this, $(1)$ reduces to
$$\int_Xf\circ Td\mu = \int_Xfd\mu \ \ \ \cdots (2)$$ for all $f\in C(X)$, where $C(X)$ is the space of continuous function on $X$.
My question is why $(1)$ reduces to $(2)$? Can anyone please provide me with a detailed explanation?
(1) is a special case of (2) when $f$ is the indicator of a Borel set $A \subset X$, i.e. $f(x) = \begin{cases} 1 & x \in A \\ 0 & x \notin A\end{cases}$. Indeed for this $f$ we have $\int_X f \, d\mu = \mu(A)$ and $\int_X f \circ T \, d\mu = \mu\{x : T(x) \in A\} = \mu(T^{-1}(A))$.
In the other direction, (2) can be obtained from (1) by writing $f$ as a limit of simple functions. Specifically, if (1) holds, then you know (2) holds for $f$ being indicator functions of Borel sets. Thus, (2) also holds for simple functions which are sums of indicator functions. As noted in the Wikipedia article, you can approximate any nonnegative function $f$ by such simple functions, so one can show (2) holds for such $f$ as well.