I have a question about the proof of the Khintchine's recurrence Theorem that states: Let $ (x,\mathcal{B}, \mu, T) $ be a measure preserving system and let $A \in \mathcal{B} $. For every $ \epsilon >0 $ the set $$ \{ n \in \mathbb{N} : \mu(A \cap T^{-n} A) > \mu(A)^2 - \epsilon \} $$ is syndetic.
At the beginning of the proof it say we apply the uniform mean ergodic theorem to the indicator function $ \mathbf{1}_A$ and we have $$ \lim_{N-M \to \infty} \frac{1}{N-M} \sum_{n=M}^{N-1} \mu(A \cap T^{-n} A) = \left< (\mathbf{1}_A)_{inv}, \mathbf{1}_A \right> $$ where $ (\mathbf{1}_A)_{inv} $ is the projection of $\mathbf{1}_A $ on the set of the almost everywhere invariant functions in $L^2$, i.e. a function $f$ satisfying $U_T f = f $, where $U_T$ is the Koopman operator, i.e. $U_T f = f \circ T$. I don't get how he use the uniform mean ergodic theorem that states: that for every $f \in L^2 $ we have $$ \lim_{M-N\to \infty} \frac{1}{M-N} \sum_{n=M}^{N-1} U_T^n f = f_{inv} $$ in $L^2$ norm
I think he does something like $$ \left< (\mathbf{1}_A)_{inv}, \mathbf{1}_A \right> = \left< \lim_{M-N\to \infty} \frac{1}{M-N} \sum_{n=M}^{N-1} U_T^n \mathbf{1}_A, \mathbf{1}_A \right> $$ $$= \lim_{N-M \to \infty} \frac{1}{N-M} \sum_{n=M}^{N-1} \left< U_T^n \mathbf{1}_A, \mathbf{1}_A \right> \overset{??}{=} \lim_{N-M \to \infty} \frac{1}{N-M} \sum_{n=M}^{N-1} \mu(A \cap T^{-n} A)$$
Proof: Recall that for $L^2(X,\mathcal{B},\mu) = H_{inv} \oplus H_{erg}$ where $ H_{inv} $ is the space of the invariant function and $H_{erg}$ is the closure of all coboundaries fuctions. Applying the uniform mean ergodic theorem to the indicator function $\mathbf{1}_A$ and then applying $ \left< \cdot , \mathbf{1}_A \right>$ gives $$ \lim_{N-M \to \infty} \frac{1}{N-M} \sum_{n=M}^{N-1} \mu(A \cap T^{-n} A) = \left< ( \mathbf{1}_A)_{inv}, \mathbf{1}_A \right> $$ Notice that for the right-hand side we have that $$ \left< (\mathbf{1}_A)_{inv}, \mathbf{1}_A\right> = \left< (\mathbf{1}_A)_{inv}, (\mathbf{1}_A)_{inv}\right> + \left< (\mathbf{1}_A)_{inv}, (\mathbf{1}_A)_{erg}\right> = \int (\mathbf{1}_A)_{inv}^2 d \mu $$ where we split the indicator function into its invariant and ergodic parts. Using Cauchy-Schwarz and the fact that $(\mathbf{1}_A)_{inv}$ is invariant we get that $$ \left< ( \mathbf{1}_A)_{inv}, \mathbf{1}_A \right> = \int (\mathbf{1}_A)_{inv}^2 d \mu \geq \left( \int (\mathbf{1}_A)_{inv} d \mu\right)^2 = \mu(A)^2 $$ hence we get that $$ \lim_{N- M \to \infty} \frac{1}{N-M} \sum_{n=M}^{N-1} \mu(T^{-n}A \cap A) \geq \mu(A)^2 $$ assume for contradiction that the set $$ \{ n \in \mathbb{N} : \mu(A \cap T^{-n} A) > \mu(A)^2 - \epsilon \} $$ is not syndetic. Thus there exists an $\epsilon > 0 $ and an arbitrarly large interval of integers $[M,N)$ such that for all $n$ we have $$ \mu(T^{-n} A \cap A) \leq \mu(A)^2 - \epsilon $$ this contradict the last inequality.
$\newcommand{\one}{\mathbf{1}}\newcommand{\d}{\,\mathrm{d}}\newcommand{\p}{P}$EDIT: I began writing this answer before you posted this comment. Although you seem to have figured out your specific issue, I think the other corollaries of the mean ergodic theorem are interesting and relevant to you. I’m also grateful that you took the time to post a full proof!
For further reading and more specific detail to what I am about to write, you may read the excellent detailed introductory text: "Operator Theoretic Aspects of Ergodic Theory" here.
If I understand your concerns correctly, you want to understand why:
Implies:
I introduce my own notation for the equalities because I think that that is much clearer. The $\langle\rangle$ notation can sometimes obfuscate what is really going on, in my opinion.
In fact, $(2)$ holds for all measurable $A$ if and only if $(1)$ holds for all $f\in L^1$ which is shown in the text (in the context of a measure preserving system, which as defined by them also asserts the measure of the whole space to be $1$, i.e. finiteness is assumed). Furthermore, all equalities are to be taken as equalities in $L$-space, rather than the literal pointwise equality of functions.
I will prove and explain $(1)\implies(2)$, if you would like me to prove and explain $(2)\implies(1)$ I will do so. In the text you will find that $(1),(2)$ are actually equivalent to a whole host of other statements - in that vein, I will prove $(1)\implies(2)$ through the interesting detour $(1)\implies(3)\implies(4)\implies(2)$ (they are actually all equivalent, and all equivalent to “$T$ is ergodic”) where:
Let me denote by $\p_U$ the mean ergodic projection operator for the Koopman action $U$ and by $A_n$ the Cesaro average: $$A_n(f)=\frac{1}{n}\sum_{j=0}^{n-1}U_T^j(f),\,\p_U(f)=\lim_{n\to\infty}A_n(f)$$
For $p\in[1,\infty)$, $f\in L^p,g\in L^{p(1-p)^{-1}}$, note that $f\cdot g\in L^1$ by Hölder’s inequality. It then suffices by a continuity and density argument to show $(3)$, without loss of generality, by showing it for $g\in L^\infty$ and $f\in L^1$ (as $L^p\subseteq L^1$ here). Note that $P_U(g)$ will be in $L^\infty$ (by assertion $(1)$) and moreover in the fixed space of $U$. We have: $$\begin{align}A_n(f\cdot P_u(g))&=\frac{1}{n}\sum_{j=0}^{n-1}U^j_T(f)\underset{=P_U(g)}{\underbrace{U^j_T(P_U(g))}}\\&=P_U(g)\cdot A_n(f)\end{align}$$And, again using Hölder: $$\begin{align}\|A_n(f\cdot P_U(g))-P_U(f)\cdot P_U(g)\|&=\|P_U(g)\cdot(A_n(f)-P_U(f))\|\\&\le\|P_U(g)\|_\infty\cdot\|A_n(f)-P_U(f)\|\\&\to0\end{align}$$So we conclude the identity $P_U(f\cdot P_U(g))=P_U(f)\cdot P_U(g)$.
We are ready to show $(3)$ (assuming $(1)$):
Since $\mathbb{E}(\one_A)=\mu(A)$, we get $(4)$ immediately by letting $f=\one_A,g=\one_B$. Then $(2)$ is immediate with $A=B$.
I hope this was interesting and useful.
Can you see why $(2)$ implies $T$ must be ergodic?