TV Distance between Limiting Distribution and One Step Transition Probability Distribution of Markov Chains

38 Views Asked by At

Edit 05/01/2023: This question may seem obtrusive at first glance, but it is warranted by some reinforcement learning algorithms, such as TD algorithm. In fact, consider building the Bellman Equation on MDP: $$ Q\left( s,a \right) =r\left( s,a \right) +\mathbb{E} \left[ Q\left( S\prime,A\prime \right) |S\prime\sim \mathbb{P} \left( \cdot |s,a \right) ,A\prime\sim \pi \left( \cdot |S\prime \right) \right], $$ then it implies the Q-learning algorithm: $$ Q\left( s,a \right) \gets Q\left( s,a \right) +\alpha \left( r\left( s,a \right) +\mathbb{E} \left[ Q\left( S\prime,A\prime \right) |S\prime\sim \mathbb{P} \left( \cdot |s,a \right) ,A\prime\sim \pi \left( \cdot |S\prime \right) \right] -Q\left( s,a \right) \right). $$ According to "A Theoretical Analysis of Deep Q-Learning"(JianQing Fan et al.), Assumption 5.3 essentially states the properties described below.
I wonder if Assumption 5.3 in that paper is suitable. I simplified the form of the problem to give the following problem.


Consider the distribution distance problem on a well defined Markov chains. Let the infinite Markov Chain is ergodic, so it is irreducible and all states $\{s_0, s_1, \cdots, s_T, \cdots\}$ are positive recurrent. Let $s_0 \sim \mu$ where $\mu$ is any stationary distribution on $\mathcal{S}$. I am interested in the following two distributions:
(1) One Step Transition Probability Distribution: $ \mathbb{E} \left[ \mathbb{P} \left( s_1\in \cdot |s_0 \right) |s_0\sim \mu \right] $
(2) Limiting Distribution: $ \lim_{t\rightarrow \infty} \mathbb{P} \left( s_t\in \cdot |s_0 \right) $
We already know that the Limiting Distribution exists and is a unique stationary distribution. So how do I handle the boundary of total variation distance $ D_{\mathrm{TV}}\left( \mathbb{E} \left[ \mathbb{P} \left( s_1\in \cdot |s_0 \right) |s_0\sim \mu \right] , \lim_{t\rightarrow \infty} \mathbb{P} \left( s_t\in \cdot |s_0 \right) \right) $?