What exactly is the probabilistic/intuitive interpretation of weak mixing of all orders? Why $A_0,\varphi^{\ast n}A_1,\varphi^{\ast 2n}A_2...$?

155 Views Asked by At

$\newcommand{\dlim}{\operatorname{Dlim}}\newcommand{\fix}{\operatorname{fix}}\newcommand{\1}{\mathbf{1}}$I am asking about the interpretations of certain ergodic dynamics definitions, citing this text from mostly chapter $9$.

First some preliminary definitions for those unfamiliar with the book's notation (the reader can skip most of this section):

A measure-preserving system is a pair $(X,\Sigma,\mu;\varphi)$, or $(X;\varphi)$ for short, where $(X,\Sigma,\mu)$ is a probability space, and $\varphi:X\to X$ is a measurable map with $\varphi_\ast\mu\equiv\mu$ everywhere. $\Sigma(X)$ denote the "measure algebra" which is the quotient $\Sigma_{/\sim}$ with $A\sim B\iff\mu(A\triangle B)=0$. The "induced map" $\varphi^\ast:\Sigma(X)\to\Sigma(X)$ is defined by $[A]\mapsto[\varphi^{-1}A]$. The Koopman operator $T:=T_\varphi$, on any $L^p(X)$, is defined by $T:f\mapsto f\circ\varphi$, which is an isometry satisfying many properties of interest. $\fix(T)$ refers to the eigenspace of eigenvalue $1$: $$\fix(T)=\{f\in L^p(X):Tf=f\}$$

A set $J\subseteq\Bbb N$ is called a subsequence if it has a countably infinite, strictly increasing, enumeration $(j_n)_{n\in\Bbb N}$ so that $J=\{j_n:n\in\Bbb N\}$. The expression $\dlim_nx(n)=y$ means that there exists a subsequence $J$, with asymptotic natural density $\mathrm{d}(J)=1$, for which $\lim_{n\to\infty}x(j_n)=y$ (in whichever relevant topology).

We define a system to be strongly mixing if: $$\tag{1}\forall A,B\in\Sigma(X):\lim_{n\to\infty}\mu(A\cap\varphi^{\ast n}B)=\mu(A)\mu(B)$$ And weakly mixing if: $$\tag{2}\forall A,B\in\Sigma(X):\dlim_n\mu(A\cap\varphi^{\ast n}B)=\mu(A)\mu(B)$$

A function $f\in L^p$ is called rigid if there is a subsequence $J$ such that $\lim_{n\to\infty}T^{j_n}f=f$ in $L^p$. The system is called mildly mixing if: $$\tag{3}\forall f\in L^p(X),\,f\text{ rigid}\implies \exists c\in\Bbb C:f=c\cdot\1$$That is, rigidity implies a.e. constancy.

The system is strongly/weakly mixing of order $k\in\Bbb N$ if: $$\forall\{A_m\}_{m=0}^{k-1}\subseteq\Sigma(X):\mu\left(\bigcap_{m=0}^{k-1}\varphi^{\ast mn}A_m\right)\to\prod_{m=0}^{k-1}\mu(A_m)$$Where the limit as $n\to\infty$ is taken to be a $\lim$ or a $\dlim$ respectively. The system is strongly/weakly mixing of all orders if this holds for every $k\ge 2$. It turns out that weak mixing implies weak mixing of all orders, but it remains an open problem whether this holds for the strong case.

We have the implications: strong$\implies$mild$\implies$weak (-ly mixing)$\implies$ergodic.

Strong mixing can be nicely intuitively described: if you have some quantity of interest in a set $A$ and you pick any non-null region of space $B$, you can wonder what the relative amount of $A$ in $B$ is after time $n$: $$\frac{\mu(A\cap\varphi^{\ast n}B)}{\mu(B)}$$Now, if the system were thoroughly mixing somehow, you would expect $A$'s contents to be equidistributed over time, so that this relative amount present in any region $B$ converges to the relative amount of $A$ in $X$, i.e. $\frac{\mu(A\cap\varphi^{\ast n}B)}{\mu(B)}\to\mu(A)$ which is $(1)$. Likewise weak mixing just replaces this statement with a weak limit of sorts.

Mild mixing is a head scratch. I know what it means formally, but on an intuitive level the best I can do is (applying to $f$ a characteristic set function): "mild mixing implies that no non-constant set can fully return to itself over any time sequence" - so it must "mix" into other regions of space. The proof that mild mixing implies weak mixing is bizarre and not intuitive (it can be shown that if $T$ has only $\lambda=1$ as an eigenvalue and if $\dim\fix T=1$, then the system weakly mixes) - indeed mild mixing feels like it should be the weakest of them all. I cannot frame mild mixing in terms of a nice "mixing" property such as $(1),(2)$, and would appreciate insight from any reader.

Furthermore, the definition for strong/weak mixing of order $k$ is very bizarre. Why do we care about such staggered time intervals as introduced by the term $\varphi^{\ast mn}$? That is, why is the quantity (for e.g. $k=11$): $$\frac{\mu(A_0\cap\varphi^{\ast n}A_1\cap\varphi^{\ast 2n}A_2\cap\cdots\cap\varphi^{\ast 10n}A_{10})}{\mu(A_1)\mu(A_2)\cdots\mu(A_{10})}\to\mu(A_0)$$Of any interest, or meaning? I cannot picture a good "mixing" concept behind knowing that the quantity in $A_0$ is well-distributed over the bizarre time steps of, e.g. $10$ seconds to arrive in $A_1$, $20$ seconds to arrive in $A_2$, ..., $100$ seconds to arrive in $A_{10}$, in the same way that I can for formula $(1)$. Can anyone provide a motivation or an intuition (or even an application!) for this concept?

Many thanks! I am studying raw theory and don't really have a grasp of what all this is good for...