Crude upper bound for TV distance to stationarity for a Markov chain

Question

Crude upper bound for TV distance to stationarity for a Markov chain

244 Views Asked by Bumbble Comm At 03 Apr 2026 - 10:30

Statement: A crude upper bound for the TV distance, starting from state $x$, from stationary distribution $\pi$ after $n$ steps, which I found here in Page 3 third line, is:

$$||P^n(x,\cdot)-\pi||_{TV}\leq \frac{\lambda^n}{2\sqrt{\pi(x)}} \tag{1}$$ where $\lambda$ is the second largest eigenvalue.

May I know why this is true?

$$||P^n(x,\cdot)-\pi||_{TV} := \frac{1}{2}\sum_{y\in \chi}|P^n(x,y)-\pi(y)|$$

and if I look at any vector $v$ defined on $\chi$, then $v = c_1v_1+c_2v_2+\cdots c_kv_k$, where $v_1,\cdots v_k$ are the eigenvectors of $P$. So $P^n v = \lambda_1^nc_1v_1+\lambda_2^nc_2v_2+\cdots\lambda_k^nc_kv_k$. $\lambda_1 = 1$ and $\lambda_2 = \lambda$ dominates this sum but not sure how to tie all this to show $(1)$?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

preliminaries

1.) I am going to re-scale each side by 2 so that we are talking about L1 norms of vectors.

2.) While the original post doesn't say this, my read of the link is this bound is for a reversible markov chain. The fact that $\frac{1}{\sqrt{\pi(x)}}$ shows up also strongly indicates this.

3.) As usual, I assume a single communicating class in this finite state time homogenous chain.

4.) Recall that reversible chains are similar via $D^\frac{1}{2}$ to a symmetric matrix. See e.g. here: If $P$ is the transition matrix of a reversible Markov chain, why are its eigenvalues real?

5.) I assume there was a typo when in the statement
$||P^n(x,\cdot)-\pi||_{TV}\leq \frac{\lambda^n}{2\sqrt{\pi(x)}} \tag{1}$
and in fact they meant
$||P^n(x,\cdot)-\pi||_{TV}\leq \frac{\vert\lambda\vert^n}{2\sqrt{\pi(x)}} \tag{1}$
and $\lambda$ isn't "the 2nd largest eigenvalue", but instead the SLEM -- second largest eigenvalue in modulus. The stated bound using the "2nd largest eigenvalue" is in fact impossible e.g. if we consider odd powers of n and any bipartite markov chain. So I use the SLEM.

notation
$\mathbf \pi$ is the steady state distribution
$D = \text{diag}(\pi)$ i.e. a diagonal matrix with $\pi_i$ on the ith diagonal component
$\mathbf e_x$ is the standard basis vector with a one in the 'xth' spot

recall
$P^n = \big(B+\mathbf 1\mathbf \pi^T\big)^n = B^n+\big(\mathbf 1\mathbf \pi^T\big)^n= B^n+\mathbf 1\mathbf \pi^T$
so $B$ is the "transient behavior matrix" and in particular
$B= D^\frac{-1}{2} C^n D^\frac{1}{2}$
where $C$ is a real symmetric matrix -- in fact it is the same matrix that $P$ is similar to, except the eigenvalue of one has been "zero'd out"

re: quaslinearization
Later I'll introduce $\mathbf w$ as part of quasilinearizing the L1 norm. $\mathbf w$ is a vector of all ones, but the sign may be chosen to be positive or negative for each entry, and since we have $k$ states in this markov chain (equivalently: have a k x k matrix), there are $2^k$ possible candidates for $\mathbf w$, and the $\mathbf w$ selected is any one that maximizes the value of the underlying sum -- i.e. this is quasilinearizing the absolute value that is taken component-wise in an L1 norm. The $\mathbf w$ is selected to be optimal at the time it is introduced, then fixed.

further note: Since I didn't see anything on quasilinearization on this site, I've added a bit more. For $z_i \in \mathbb R$, the absolute value function can be quasilinearized by considering
$$ \begin{equation*} \begin{aligned} & {\text{max}} & & w_i \cdot z_i\\ & \text{subject to} & & w_i \in \{-1,1\} \end{aligned}\end{equation*}=\big \vert z_i\big \vert$$ so in context of the L1 norm of some given vector $\mathbf z \in \mathbb R^k$ this reads
$$\big \Vert \mathbf z \big \Vert_1 = \sum_{i=1}^k \big \vert z_i\big \vert = \begin{equation*} \begin{aligned} & {\text{max}} & & \mathbf z^T \mathbf w\\ & \text{subject to} & & w_i \in \{-1,1\} \end{aligned}\end{equation*} =\mathbf z^T \mathbf w$$

where the RHS of just $\mathbf z^T \mathbf w$ is convenient shorthand as it is understood that $\mathbf w$ has been selected in the manner described above.

main argument
$2\cdot \big \Vert P^n(x,\cdot)-\pi\big \Vert_{TV}$
$= \big \Vert \mathbf e_x^T P^n - \mathbf \pi^T\big \Vert_1$
$= \big \Vert \mathbf e_x^T (B + \mathbf 1 \mathbf \pi^T )^n - \mathbf \pi^T\big \Vert_1$
$= \big \Vert \mathbf e_x^T (B^n + \mathbf 1 \mathbf \pi^T ) - \mathbf \pi^T\big \Vert_1$
$= \big \Vert \mathbf e_x^T B^n + \mathbf e_x^T\mathbf 1 \mathbf \pi^T - \mathbf \pi^T\big \Vert_1$
$= \big \Vert \mathbf e_x^T B^n \big \Vert_1$
$= \big \Vert \mathbf e_x^T D^\frac{-1}{2} C^n D^\frac{1}{2} \big \Vert_1$
$= \frac{1}{\sqrt{\pi(x)}}\cdot \big \Vert \mathbf e_x^T C^n D^\frac{1}{2} \big \Vert_1$
$= \frac{1}{\sqrt{\pi(x)}}\cdot \mathbf e_x^T C^n D^\frac{1}{2} \mathbf w$
$= \frac{1}{\sqrt{\pi(x)}}\cdot \big(C^n\mathbf e_x\big)^T \big(D^\frac{1}{2} \mathbf w\big)$
$\leq \frac{1}{\sqrt{\pi(x)}}\cdot \big \Vert C^n\mathbf e_x\big \Vert_2 \cdot \big \Vert D^\frac{1}{2} \mathbf w \big \Vert_2 $
$= \frac{1}{\sqrt{\pi(x)}}\cdot \big \Vert C^n\mathbf e_x\big \Vert_2 \cdot \big(\sum_{i=1}^kw_i^2 \cdot d_{i,i}\big)^{\frac{1}{2}} $
$= \frac{1}{\sqrt{\pi(x)}}\cdot \big \Vert C^n\mathbf e_x\big \Vert_2 \cdot \big(\sum_{i=1}^k 1 \cdot \pi_i\big)^\frac{1}{2} $
$= \frac{1}{\sqrt{\pi(x)}}\cdot \big \Vert C^n\mathbf e_x\big \Vert_2 \cdot 1 $
$= \frac{1}{\sqrt{\pi(x)}}\cdot \big(c_{x,x}^{(2n)}\big)^\frac{1}{2}$
$\leq \frac{1}{\sqrt{\pi(x)}}\cdot \Big(\sigma_\text{max}\big(C^{2n}\big)\Big)^\frac{1}{2}$
$= \frac{1}{\sqrt{\pi(x)}}\cdot \Big(\big \vert \lambda\big \vert^{2n}\Big)^\frac{1}{2}$
$= \frac{1}{\sqrt{\pi(x)}}\cdot \big \vert \lambda\big \vert^{n}$

where the first inequality is Cauchy-Schwarz and the second inequality comes from the fact that diagonal elements of a positive semi-definite matrix are a lower bound for its maximal eigenvalue (this is equivalent to the definition of the Operator 2 norm).

Crude upper bound for TV distance to stationarity for a Markov chain

There are 1 best solutions below

Related Questions in REAL-ANALYSIS

Related Questions in PROBABILITY-THEORY

Related Questions in MARKOV-CHAINS

Trending Questions

Popular # Hahtags

Popular Questions