In a set of lecture notes, I have the following result:
Theorem. Let $X_n$ be random variables on $(\Omega, \mathcal{F}, \mathbb{P})$ with values in a Polish metric space $S$. Suppose $X = (X_n)_{n \geq 1}$ is a stationary sequence. Then $X$ is ergodic if and only if for any bounded Borel measurable function $g: S^p \to \mathbb{R}$ with $p \geq 1$ an arbitrary integer, $$\dfrac{1}{n}\sum_{m=0}^{n-1}g(X_{m+1}, \dots, X_{m+p}) \overset{a.s.}{\to} \mathbb{E}[g(X_1, \dots, X_p)]\text{.}$$
Note that $\overset{a.s.}{\to}$ denotes almost sure convergence as $n \to \infty$.
I have been trying to find this result in the 20-30 measure-theoretic probability books I have to no avail, as well as An Introduction to Ergodic Theory by Walters. Does anyone know of a textbook where I can find this result? I would strongly prefer a reference with a proof, but would be willing to take those without as well.
Edit: Adding definitions as requested.
Given $X$ above, it is ergodic if for any invariant set $A \in \mathcal{F}$, $\mathbb{P}(A) \in \{0, 1\}$.
By "invariant set," we say a set $A \in \mathcal{F}$ is invariant with respect to $X$ if for some $B \in \mathcal{B}(\mathbb{R}^{\infty})$ ($\mathcal{B}(\mathbb{R}^{\infty})$ denoting the Borel $\sigma$-algebra generated by $\mathbb{R}^{\infty}$), $A = \{(X_n, X_{n+1}, X_{n+2}, \dots)\} \in B$ for all $n \geq 1$.
[I suspect that $S^{\infty}$ should be used in place of $\mathbb{R}^{\infty}$ in the above definitions and that $\in$ should be $\subset$, but that's how they are presented in the lecture notes.]
Edit 2: I found this claim in some other sources, though not in great detail. It would be nice to find a textbook.
- Last sentence of http://www.columbia.edu/~ks20/6712-14/6712-14-Notes-Ergodic.pdf
- Appendix A of GARCH Models: Structure, Statistical Inference and Financial Applications uses the theorem above as the definition of an ergodic stationary process. This passage cites Billingsley (1995), which I assume is Probability and Measure - but I know that this theorem is not in there.
Consider the case $p=1$ (allowing for $p>1$ seems unnecessary). By Birkhoff's ergodic theorem (http://math.uchicago.edu/~may/REU2016/REUPapers/Ran.pdf , Theorem 7.1), for any measurable and bounded $g$, the following convergence holds almost surely: $$\frac{1}{n}\sum_{i=1}^n g(X_i) \to \mathbb{E}[g(X_1) |\mathcal{I}],$$ where $\mathcal{I}$ is the $\sigma-$algebra of invariant sets. From the definition of conditional expectation $$\mathbb{E}[g(X_1) |\mathcal{I}] = \mathbb{E}[g(X_1)] \qquad \forall g \text{ measurable + bounded }$$ is equivalent to $\mathcal{I}$ being the trivial $\sigma-$algebra, which is in turn equivalent to ergodicity.