I was working through Theorem 20.3 Part (i) from Markov Chains and Mixing Times (Levin, Peres, Wilmer) and I am confused about the justification of an inequality. From the book (p. 283):
Proof of Theorem 20.3. (i), Step 1. Recall that $N_t$ is the Poisson random variable indicating the number of transitions in the continuous-time chain. We first prove $$ \left\|H_t(x, \cdot)-\pi\right\|_{\mathrm{TV}} \leq \mathbf{P}\left\{N_t<k\right\}+\left\|P^k(x, \cdot)-\pi\right\|_{\mathrm{TV}} . $$ Conditioning on the value of $N_t$ and applying the triangle inequality give $$ \left\|H_t(x, \cdot)-\pi\right\|_{\mathrm{TV}} \leq \sum_{j \geq 0} \mathbf{P}\left\{N_t=j\right\}\left\|P^j(x, \cdot)-\pi\right\|_{\mathrm{TV}} $$
My question: I feel like I am missing something simple, but why can one get the inequality from applying the triangle inequality alone? My thought was that, because you have an infinite sum, you would need Fubini-Tonelli or something like this to exchange the sums? The following is my attempt at a proof:
By definition (p.281), the Heat Kernel with transition matrix P is
$$ \begin{aligned} H_t(x, y) & =\sum_{k=0}^{\infty} \frac{e^{-r t}(r t)^k}{k !} P^k(x, y) \end{aligned} $$
Thus, using $P^k_x$ to denote the distribution at time $k$ starting from $x$, and $(H^t_x$ is analogous.)
$$ \begin{align} \begin{split} \|H^t_x - \pi\|_{TV}& =\left\| \left(\sum_{k=0}^{\infty} \frac{e^{-r t}(r t)^k}{k !} P_x^k\right) - \pi\right\|_{TV}\\ &\leq\left|\sum\limits_{\omega \in A}\left(\sum_{k=0}^{\infty} \frac{e^{-r t}(r t)^k}{k !} P_x^k(\omega)\right) - \pi(\omega)\right|\\ \end{split} \end{align} $$
Using A to denote the set which maximizes the absolute value of the difference between $P_x^k$ and $\pi$ , then this line follows from the definition of TVD.
Then by Fubini-Tonelli
$$\left|\sum_{k=0}^{\infty} \frac{e^{-r t}(r t)^k}{k !} \sum\limits_{\omega \in A} P_x^k(\omega) - \pi(\omega)\right| $$
Then by Triangle inequality
$$ \begin{align} \begin{split} &\leq\sum_{k=0}^{\infty}\left| \frac{e^{-r t}(r t)^k}{k !} \sum\limits_{\omega \in A} P_x^k(\omega) - \pi(\omega)\right|\\ &=\sum_{k=0}^{\infty}\left| \frac{e^{-r t}(r t)^k}{k !}\right| \left|\sum\limits_{\omega \in A} P_x^k(\omega) - \pi(\omega)\right|\\ &\leq\sum_{k=0}^{\infty}\frac{e^{-r t}(r t)^k}{k !} \|P_x^k-\pi \|_{TV}\\ &=\sum_{k=0}^{\infty}P(N(rt)=k) \|P_x^k-\pi \|_{TV}\\ \end{split} \end{align} $$
where N(rt) is a poisson of rate $rt$. And this is the same as the given inequality. Please do let me know if I've made a mistake in my proof. If not though, I'm not sure how you would get here without using something to interchange the sum from TVD and the infinite sum.
Based on the comments your definition of $A$ wanted to be the maximizer event in the definition of $\|H_x^t-\pi\|_{TV}$, which can be written as $A = \{\omega\in\mathcal{X} : H_x^t(\omega) \ge \pi(\omega)\}$ by Remark 4.3 in Markov Chains and Mixing Times (Levin, Peres, Wilmer). Also, based on your observation, we have $$ \left|\sum_{\omega \in A}P_x^k(\omega)-\pi(\omega)\right| \le \|P_x^k - \pi\|_{TV} $$ by the definition of $\|\cdot\|_{TV}$ for any $A \subseteq \mathcal{X}$.
Hence it remains to prove that you can apply Fubini's theorem, which is true if: $$ \sum_{\omega\in A}\sum_{k=0}^\infty \frac{e^{-rt}(rt)^k}{k!} \left|P_x^k(\omega)-\pi(\omega)\right| < \infty . $$ But this holds by using $\sum_{\omega\in A}H_x^t(\omega) \le 1$ and $\sum_{\omega \in A}\pi(\omega) \le 1$ as they are distributions over the state space $\mathcal{X}$ and $A \subseteq \mathcal{X}$, and then $$ \sum_{\omega\in A}\sum_{k=0}^\infty \frac{e^{-rt}(rt)^k}{k!} \left|P_x^k(\omega)-\pi(\omega)\right| \le \sum_{\omega\in A}\sum_{k=0}^\infty \frac{e^{-rt}(rt)^k}{k!} \Big(P_x^k(\omega) + \pi(\omega)\Big) = \sum_{\omega\in A} H_x^t(\omega) + \pi(\omega) \le 2 . $$ So it seems that your proof is correct after all. Nice work! :)