According to wikipedia, the total variation of the real-valued function $f$, defined on an interval $[a,b]\subset \mathbb{R}$, is the quantity $$V_b^a=\sup_{P\in\mathcal{P}}\sum_{i=0}^{n_P-1}\left | f(x_{i+1})-f({x_i)}\right |$$ where $\mathcal{P}= \left \{P=\{x_0,\ldots, x_{n_P}\} \mid P \text{ is a partition of } [a,b]\right \}$.
According to my professor, the total variation is the quantity $$V_b^a=\limsup_{\delta(P)\to 0}\sum_{i=0}^{n_P-1}\left | f(x_{i+1})-f({x_i)}\right |$$ where $\delta(P)=\max_k (x_k-x_{k-1})$.
Why are the two definitions equivalent?
Let $V_1$ be the first, $V_2$ be the second. Clearly $V_1 \ge V_2$ just because the $\sup$ is taken over all partitions, including those for which the mesh size goes to zero.
Suppose $P$ is a partition and $\sigma(P) = \sum_{i=0}^{n_P-1}\left | f(x_{i+1})-f({x_i)}\right| $. If $P'$ is a refinement of $P$ (that is, $P \subset P'$), we have $\sigma(P) \le \sigma(P')$. Hence we must have $V_2 \ge \sigma(P)$ for any $P$.
Now suppose $\epsilon>0$ and $P$ is a partition such that $\sigma(P) > V_1 -\epsilon$. Then we have $V_2 \ge \sigma(P) > V_1 -\epsilon$. Since $\epsilon >0$ was arbitrary, we have $V_2 \ge V_1$, and so $V_1 =V_2$.