I'm reading understanding machine learning and several of the latest lemmas I've studied involved this inequality which I've searched for but found no justification of whatsoever. Could anyone point me in the right direction, please?
Let $X$ be a random variable and $x' \in \mathbb{R}$ be a scalar. For all $i = 0, 1, 2,...$ denote $t_i = f(i)$ where $f$ is monotonically increasing. Then, $$\mathbb{E}[|X-x'|] \leq \sum_{i=1}^{\infty} t_i \mathbb{P}[|X-x'| > t_{i-1}]$$
EDIT: I missed the initial condition $t_0 = 0$. Sorry for the misunderstanding.
Assume $t_0=0$. Apply the following Claim with the random variable $|X-x'|$ in place of $y$, then take expectations.
Claim: For every $y\ge0$, $$y\le \sum_{i=1}^\infty (t_i-t_{i-1})I(y>t_{i-1}),\tag1$$ where $I(A)$ denotes the indicator of condition $A$, i.e., $I(A)$ is one if condition $A$ is true, zero otherwise.
Proof: Pick $y\ge 0$, and let $J$ be the largest $i$ such that $y>t_{i-1}$. Then $y\le t_J$. The RHS of (1) reduces to $\sum_{i=1}^J (t_i-t_{i-1})$ which telescopes to $t_J-t_0=t_J$. But $t_J\ge y$, and we're done!
Aside: This inequality can be interpreted geometrically in view of the identity $$E(Y)=\int_{t=0}^\infty \mathbb{P}(Y>t)\,dt,\tag2$$ valid for any nonnegative random variable $Y$. The inequality $$E(Y)\le\sum_{i=1}^\infty(t_i-t_{i-1})P(Y>t_{i-1})$$ is obtained by bounding the RHS of (2) by the sum of areas of rectangles over the intervals $[0, t_1]$, $[t_1, t_2]$, $[t_2,t_3]$, etc.