In my probability course, my professor derived the negative binomial distribution by reasoning about the probability that the time of the $k$-th success, $T_k$, takes some value $n$. If $p$ is probability of success, $X_k$ is the outcome of the $k$-th Bernoulli trial and $S_n$ is the number of successes by trial $n$, then
$$ \mathbb{P}\{T_k = n\} = \mathbb{P}\{X_n = 1\} \mathbb{P}\{S_{n-1} = k-1\} = {n - 1 \choose k - 1} p^k (1-p)^{n-k}, \qquad n \geq k $$
and he denotes this
$$ T_k \sim \text{NB}(k, p) $$
This all makes sense.
But when I try to read about the negative binomial distribution on Wikipedia, it was described as the probability of the number of successes until a fixed number of failures, $r$, and is denoted $X \sim \text{NB}(r, p)$.
$$ \mathbb{P}\{X = k\} = {k + r - 1 \choose k} p^k (1 - p)^r $$
If we note that $n = k + r$, this is almost the same:
$$ \mathbb{P}\{X = k\} = {n - 1 \choose k} p^k (1 - p)^{n - k} $$
Anyway, are these the same modeling assumptions? Why are they parameterized differently? Also, is there some connection between interpreting the r.v. as a time vs. a count?