"Distance" between distributions and convergence of Markov chains

Question

"Distance" between distributions and convergence of Markov chains

483 Views Asked by Bumbble Comm At 10 May 2026 - 2:56

For what follows, let us restrict ourselves to discussing discrete probability distributions.

When learning about things like relative entropy (aka Kullback-Leibler divergence) $D(\cdot || \cdot)$, one typically learns that it does not technically define a metric, since $D(P||Q) \ne D(Q||P)$, for some distributions $P,Q$.

In general, I don't think there is any well-defined notion of distance between two probability distributions.

However, when learning about Markov chains, we learn about nice conditions under which some distribution $\pi_0$ converges to the stationary distribution of the Markov chain $\pi$ (assuming it exists).

Since there is no well-defined metric on the set of probability distributions, what is going on here? How is this convergence defined? In precisely what sense is $\pi_0$ approaching $\pi$? Is there some (obvious) underlying topology that I am just not seeing? Can someone help reconcile what's going on here?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2018-03-05 15:44:45

Yes, there are many possible topologies on the space of all probability distributions. You can see https://en.wikipedia.org/wiki/Convergence_of_measures for an overview. The most commonly encountered is the weak topology; one of several equivalent definitions is that a sequence of probability measures $P_n$ on, say, $\mathbb{R}$, converges to $P$ iff $\int f\,dP_n \to \int f\,dP$ for every bounded continuous function $f$. It's also called "convergence in law" or "convergence in distribution". It is discussed in most graduate-level probability books (e.g. Durrett, Resnick). The canonical source to learn more is Billingsley, Convergence of Probability Measures.

It is true, for instance, that under appropriate conditions, the $n$-step distribution of a Markov chain converges, in the weak topology, to the stationary distribution.

In general, I don't think there is any well-defined notion of distance between two probability distributions.

Well, you are incorrect. The K-L divergence doesn't happen to be a metric, but there are plenty of other metrics. For instance, there's the Lévy metric, which induces the weak topology. The total variation distance is also a metric, but it induces a different (and less useful) topology.

"Distance" between distributions and convergence of Markov chains

There are 1 best solutions below

Related Questions in PROBABILITY-DISTRIBUTIONS

Related Questions in MARKOV-CHAINS

Related Questions in ENTROPY

Trending Questions

Popular # Hahtags

Popular Questions