Total variation does not take into account large distances.

100 Views Asked by At

Let $(X,\Sigma,\mu)$ be a measure space, then define

$$|\mu |(E):=\sup\limits_{\pi }\sum _{A\in \pi }|\mu (A)|\qquad \forall E\in \Sigma $$ where the supremum is taken over all partitions $\pi$ of the set $E$.

I have two questions with respect to this total variation norm.

First, I am not so sure why for probability measure we are left with : $${\displaystyle |\mu -\nu |(X)=2\sup \left\{\,\left|\mu (A)-\nu (A)\right|:A\in \Sigma \,\right\}}$$

Second, I read that "total variation does not take into account large distances" but I don't really understand how to interpret this sentence.

1

There are 1 best solutions below

0
On

For your first question: suppose $\pi$ is a partition of $X$. For each $A\in\pi$, we must have either $\mu(A)\geq\nu(A)$ or $\mu(A)<\nu(A)$. Let $\pi_{\geq}$ and $\pi_{<}$ represent the subsets of the partition with each of these properties.

Then define $$ P_\pi:=\bigcup_{A\in \pi_{\geq}}A\qquad N_\pi:=\bigcup_{A\in\pi_{<}}A. $$ A few things to note:

  1. We have $\mu(N_\pi)=1-\mu(P_\pi)$ and $\nu(N_\pi)=1-\nu(P_\pi)$.

  2. Note that $$ \begin{align*} \sum_{A\in\pi}\lvert\mu(A)-\nu(A)\rvert&=\sum_{A\in\pi_{\geq}}\lvert\mu(A)-\nu(A)\rvert+\sum_{A\in\pi_{<}}\lvert\mu(A)-\nu(A)\rvert\\ &=\sum_{A\in\pi_{\geq}}(\mu(A)-\nu(A))+\sum_{A\in\pi_<}(\nu(A)-\mu(A))\\ &=\sum_{A\in\pi_{\geq}}\mu(A)-\sum_{A\in\pi_{\geq}}\nu(A)+\sum_{A\in\pi_{<}}\nu(A)-\sum_{a\in\pi_{<}}\mu(A)\\ &=\mu(P_\pi)-\nu(P_\pi)+\nu(N_\pi)-\mu(N_\pi). \end{align*} $$ (There are some steps to justify there!)

  3. Now, note that $X$ is the disjoint union of $P_\pi$ and $N_\pi$, and therefore $$ \mu(P_\pi)-\mu(N_\pi)=\mu(P_\pi)-(1-\mu(P_\pi))=2\mu(P_\pi)-1 $$ and similarly $\nu(P)-\nu(N)=2\nu(P)-1$. So, we have $$ \sum_{A\in\pi}\lvert\mu(A)-\nu(A)\rvert=2(\mu(P_\pi)-\nu(P_\pi))=2\lvert\mu(P_\pi)-\nu(P_\pi)\rvert. $$ So, we can now conclude that $$ \sup_{\pi}\sum_{A\in\pi}\lvert\mu(A)-\nu(A)\rvert=2\sup_{\pi}\lvert\mu(P_\pi)-\nu(P_\pi)\lvert. $$

What remains is to show that $$ \sup_\pi\lvert\mu(P_\pi)-\nu(P_\pi)\rvert=\sup_{A\in\Sigma}\lvert\mu(A)-\nu(A)\rvert. $$ On the one hand, $P_\pi\in\Sigma$ for all $\pi$, so that $$ \sup_{\pi}\lvert\mu(P_\pi)-\nu(P_\pi)\lvert\leq\sup_{A\in\Sigma}\lvert\mu(A)-\nu(A)\rvert. $$ On the other hand, for any $A\in\Sigma$ you can note that $X=A\dot\cup(X\setminus A)$ is a partition, which proves the other direction.

For your second question around 'large distances', it really depends a lot on what the context is. But there is definitely an asymmetry to total variation. Take $\nu$ and $\mu$ to be probability measures. Then the total variation distance is basically about finding an event on which the two measures disagree as much as possible. So total variation distance $0$ tells you a whole lot: the distributions are essentially identical. But total variation distance 1 tells you that you can basically partition the event space into one where $\mu$ reigns and one where $\nu$ does; this might or might not be important depend on what is happening downstream of your probability distribution. Maybe it is, for instance, latching on to some property that isn't actually relevant to the question you're trying to ask.