Equality between expressions for total variation distance between two discrete probability distributions.

1.3k Views Asked by At

I found an excellent answer to a more general form of this question here: Two notions of total variation norms but it is a bit more sophisticated than what I am looking for. My question

Let $X,Y$ be positive integer valued random variables. Then $$d_{\mathrm{TV}}(X,Y) = \frac12\sum_{k=1}^\infty|\mathbb P(X=k)-\mathbb P(Y=k)| = \sup_{A\subset\mathbb R}|\mathbb P(X\in A) - \mathbb P(Y\in A)|. $$

I do not see how this equality holds. In particular the factor of $\frac12$ is throwing me off. How can we see that this sum is proportional to this supremum?

1

There are 1 best solutions below

1
On BEST ANSWER

Note $$P(X \in A) - P(Y \in A) = \sum_{k \in A} (P(X = k) - P(Y = k))$$ Additionally, $$P(X \in A) - P(Y \in A) = P(Y \in A^c) - P(X \in A^c) = \sum_{k \in A^c} (P(Y = k) - P(X = k))$$

Choosing $A = \{k : P(X = k) \ge P(Y = k)\}$ and summing the above two equations yields $$2(P(X \in A) - P(Y \in A)) = \sum_{k = 1}^\infty |P(X = k) - P(Y = k)|.$$ The left-hand side is nonnegative because the right-hand side is nonnegative, so you can take the absolute value of the left-hand side without changing the equality.

Finally, note that this choice of $A$ maximizes the expression in the supremum.