I'm looking for a motivation and standard definition of Markov operators. Sources online seem elusive, and I've seen several definitions:
definition 1: Here is one, saying $P$ is an operator on the space of $L^1(\mu)$ functions that
- preserves nonnegativity
- and such that $\int (Pf)(x) \mu(dx) = \int f(x) \mu(dx) \;\;\; \text{ for any } f\in L^1(\mu).$
I interpret this integral-preserving property as "the Markov transition maintains the same total probability" on the state space.
definition 2: Here a Markov operator is defined as any operator on $L^2(X,\mu)$ that is
- a contraction,
- sending the identity to one,
- preserving nonnegativity.
Why is this now defined on $L^2$? Since the overlap of $L^1$ and $L^2$ functions is not that simple, there must be a fundamental difference in the definition. And is property 2 of definition 1 somehow equivalent to properties 1 and 2 of definition 2?
Related?: I believe Markov operators are closely related to the infinitesimal generator of a stochastic process, given by $$ Af(x) = \lim_{t\downarrow 0} \frac{\mathbb{E}[f(X_t) | X_0=x] - f(x)}{t}.$$
Is this a type of Markov operator? (The infinitesimal generator is also an operator on measurable functions).
What's the equivalence between these two definitions and what's the intuition of why we need Markov operators? Furthermore, what's the point of defining it for all $L^1$ or $L^2$ functions?