I have problem understanding the difference when I look at the alternative definition of a.s. convergence. I know how it is defined originally, but it is the alternative definition which makes it easier to compare it to convergence in probability:

Convergence in probability is defined that for any $\epsilon >0$.:

We see there that the only difference is the sup. But even with the sup, I struggle to see the difference, can someone explain to me where I see it incorrect:?
For instance, if I look at convergence in probability, I would think like that. I choose an $\epsilon$. Then for any $\epsilon_2$, there is an N, such that if $n \ge N$, then $Pr(|X_n-X|\ge \epsilon)<\epsilon_2$. Now comes my problem: Since this holds for all $n\ge N$, why is it then not equal to the alternative characterisation of a.s. convergence?
UPDATE: Is it correct to say that the difference is that if $\epsilon$ and $\epsilon_2$ is given. Than for all $n \ge N$ you can in the first case use the same subset of the sample-space. But in the case of only convergence in probability, you may have to change the subset of the sample space for each $n \ge N$?
The core reason is that $$\color{red}{\sup_\color{black}{n}}\color{green}{\Pr}(|X_n-X|\geqslant\varepsilon)\qquad\text{and}\qquad \color{green}{\Pr}(\color{red}{\sup_\color{black}{n}}|X_n-X|\geqslant\varepsilon)$$ have little in common. In general, the latter is (much) larger than the former. For instance, if $\Pr(X_n=1)=1-\Pr(X_n=0)=1/n$ for every $n\geqslant1$ and $(X_n)$ is independent, then $X=0$ and, for every $m\geqslant1$ and every positive $\varepsilon\leqslant1$, $$\sup_{n\geqslant m}\Pr(|X_n-X|\geqslant\varepsilon)=1/m,\qquad \Pr(\sup_{n\geqslant m}|X_n-X|\geqslant\epsilon)=1.$$