Difference in probability distributions from two different kernels

Question

Difference in probability distributions from two different kernels

517 Views Asked by Bumbble Comm At 04 Apr 2026 - 4:50

I wonder if the probability kernels of Markov processes on the same state space are close enough, does it also hold for the probabilities of the event that depend only on first $n$ values of the process.

More formally, let $(E,\mathscr E)$ be a measurable space and put $(E^n,\mathscr E^n)$, where $\mathscr E^n$ is the product $\sigma$-algebra. We say that $P$ is a stochastic kernel on $E$ if $$ P:E\times\mathscr E\to [0,1] $$ is such that $P(x,\cdot)$ is a probability measure on $(E,\mathscr E)$ for all $x\in E$ and $x\mapsto P(\cdot,A)$ is a measurable function for all $A\in \mathscr E$. On the space $b\mathscr E$ of real-valued bounded measurable functions with a sup-norm $\|f\| = \sup\limits_{x\in E}|f(x)|$ we define the operator $$ Pf(x) = \int\limits_E f(y)P(x,dy). $$ Its induced norm is given by $\|P\| = \sup\limits_{f\in b\mathscr E\setminus 0}\frac{\|Pf\|}{\|f\|}.$ Furthermore, for any stochastic kernel $P$ we can assign the family of probability measures $(\mathsf P_x)_{x\in E}$ on $(E^n,\mathscr E^n)$ which is defined uniquely by $$ \mathsf P_x(A_0\times A_1\times \dots\times A_n) = 1_{A_0}(x)\int\limits_{A_n}\dots \int\limits_{A_1}P(x,dx_1)\dots P(x_{n-1},dx_n). $$

Let us consider another kernel $\tilde P$ which as well defines the operator on $b\mathscr E$ and the family of probability measures $\tilde{\mathsf P}_x$ on $(E^n,\mathscr E^n)$. I wonder what is the upper-bound on $$ \sup\limits_{x\in E}\sup\limits_{F\in \mathscr E^n}|\tilde{\mathsf P}_x(F) - \mathsf P_x(F)|. $$

By induction it is easy to prove that $$ \sup\limits_{x\in E}|\tilde{\mathsf P}_x(A_0\times A_1\times \dots\times A_n)-\mathsf P_x(A_0\times A_1\times \dots\times A_n)|\leq n\cdot\|\tilde P - P\| $$ but I am not sure if this result can be extended to any subset of $\mathscr E^n$.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 19 Apr 2012 - 8:09

The notation can be a bit cumbersome because of the nested integrals, but this solution relies only on very basic properties of integration and is direct (no induction).

Consider the difference $a(x_0,F)=\mathsf P'_{x_0}(F)-\mathsf P_{x_0}(F)$. By uniqueness of measure it follows from the definition of $\mathsf P_{x_0}$ that: $$\begin{align} a(x_0,F) =&\int_E\dots\int_E 1_F(x_1,\dots x_n) P'(x_{n-1},dx_n)\dots P'(x_0,dx_1)\\ &- \int_E\dots\int_E 1_F(x_1,\dots x_n) P(x_{n-1},dx_n)\dots P(x_0,dx_1) \end{align}$$

By introducing intermediate telescoping terms we can split this into a sum of $n$ terms $a(x_0,F)=\sum_{j=1}^n a_j(x_0,F)$ where $$\begin{align} a_j(x_0,F) =& \int_E\dots\int_E 1_F(x_1,\dots x_n) P'(x_{n-1},dx_n) \dots P'(x_{j-1},dx_j) P(x_{j-2},dx_{j-1})\dots P(x_0,dx_1)\\ &- \int_E\dots\int_E 1_F(x_1,\dots x_n) P'(x_{n-1},dx_n) \dots P'(x_j,dx_{j+1}) P(x_{j-1},dx_j)\dots P(x_0,dx_1) \end{align}$$

The innermost part (consisting of $n-j$ nested integrals) is common to both terms, to make things more readable we factor it into $$g_j(x_1,\dots x_j) = \displaystyle\int_E\dots\int_E 1_F(x_1,\dots x_n) P'(x_{n-1},dx_n)\dots P'(x_j,dx_{j+1})$$ Then by linearity and $|\int f|\le\int |f|$: $$\begin{align} |a_j(x_0,F)|\le& \int_E\dots\int_E\left|\int_E g_j(x_1,\dots x_j) \left(P'(x_{j-1},dx_j)-P(x_{j-1},dx_j)\right)\right| P(x_{j-2},dx_{j-1})\dots\\ \le& \int_E\dots\int_E\|P'-P\| P(x_{j-2},dx_{j-1})\dots P(x_0,dx_1)\\ =& \|P'-P\|\\ |a(x_0,F)|\le& n\cdot\|P'-P\| \end{align}$$ (the bound of the integral by $\|P'-P\|$ comes from considering the function $g_j(x_1,\dots x_{j-1},\cdot)$)

**Bumbble Comm** · Accepted Answer

I hope I have a solution for the problem, so I post it here. I'll be happy if you comment on the solution if it is correct or maybe provide more short and neat one.

First of all, I change a notation a bit and use $\mathsf P_x^n$ instead of $\mathsf P_x$ in OP to denote the probability measure on the space $(E^n,\mathcal E^n)$, just to mention the dependence on $n$ expolicitly. Then for all measurable rectangles $B = A_1\times A_2\times\dots\times A_n\in \mathcal E^{n-1}$ and the set $A_0\in \mathcal E$ it holds that $$ \mathsf P_x^n(A_0\times B) = 1_{A_0}(x)\int\limits_{A_1}\dots \int\limits_{A_n}P(x_{n-1},dx_n)\dots P(x,dx_1) = 1_{A_0}(x)\int\limits_E \mathsf P_{y}^{n-1}(B)P(x,dy). $$ By the uniqueness of the probability measure $\mathsf P_x^n$ the same result holds for any $B\in \mathcal E^{n-1}$: $$ \mathsf P_x^n(A_0\times B) = 1_{A_0}(x)\int\limits_E \mathsf P_{y}^{n-1}(B)P(x,dy). \tag{1} $$
For any set $C\in \mathcal E^n = \mathcal E\times\mathcal E^{n-1}$ we can show that $$ \mathsf P_x^n(C) = \int\limits_E \mathsf P_y^{n-1}(C_x)P(x,dy) \tag{2} $$ where $C_x = \{y\in E^{n-1}:(x,y)\in C\}\in\mathcal E^{n-1}$. To prove it we first verify $(2)$ for measurable rectangles $C = A\times B$ using $(1)$, hence $C_x = B$ if $x\in A$ and $\emptyset$ otherwise. By the advise of @tb this result further extends to all $C\in \mathcal E^n$ by $\pi$-$\lambda$ theorem.
The inequality $\left|\tilde{\mathsf P}_x^n(C) - \mathsf P_x^n(C)\right|\leq n\|\tilde P-P\|$ can be proved then by induction: it clearly holds for $n=1$ $$ \left|\tilde{\mathsf P}^1_x(C) - \mathsf P^1_x(C)\right| = \left|\tilde P(x,C_x) - P(x,C_x)\right|\leq 1\cdot\|\tilde P - P\|. $$ If the same inequality holds for $n-1$, we have $$ \begin{align} \left|\tilde{\mathsf P}_x^n(C) - \mathsf P_x^n(C)\right| &= \left|\int\limits_E \tilde{\mathsf P}_y^{n-1}(C_x)\tilde P(x,dy)-\int\limits_E \mathsf P_y^{n-1}(C_x)P(x,dy)\right| \\ &\leq \left|\int\limits_E \left(\tilde{\mathsf P}_y^{n-1}(C_x) - \mathsf P_y^{n-1}(C_x)\right)\tilde P(x,dy)\right|+\left|\langle \tilde P(x,\cdot) - P(x,\cdot),\mathsf P_{(\cdot)}^{n-1}(C_x)\rangle\right| \\ &\leq (n-1)\|\tilde P - P\|+\|\tilde P - P\| = n\|\tilde P - P\| \end{align} $$ where $\langle m(\cdot),f(\cdot)\rangle = \int\limits_E f(y)\mu(dy)$ for all measurable bounded $f$ and all finite signed measures $\mu$. Since the RHS in the bound derived does not depend on $x,C$ it means that we proved desired bounds.

Difference in probability distributions from two different kernels

There are 2 best solutions below

Related Questions in MEASURE-THEORY

Related Questions in PROBABILITY-THEORY

Related Questions in STOCHASTIC-PROCESSES

Related Questions in MARKOV-PROCESS

Trending Questions

Popular # Hahtags

Popular Questions