In statistics, a sequence of random variables $X_n$ is said to converge to a random $X$ in probability if $P(|X_n - X| > \epsilon ) \to 0 $. Also, $X_n$ is said to converge to a constant c if $P(|X_n - c| > \epsilon ) \to 0 $.
I can understand the argument with the constant $c$ which means that $X_n$ gets more and more concentrated on $c$, as $n \to \infty$. However, I don't quite understand the "converge to a r.v." argument. It seems to me that $X$ has to be degenerated for this to hold true. But the textbook I had in my hand does not elaborate on this, so I am confused. Can you explain it ? or by showing some simple examples?
You have to grasp the fact that $X, X_1, .... X_n, ...$ are all tied to an underlying sample space. Imagine these values are all "generated" at once, possibly by running a computer program.
Now choose some $\epsilon \gt 0$ and $\alpha \in [0,1)$. I will be able to select an $N$ for you such that if $n \gt N$, then $|X_n - X| \le \epsilon$ for at least $100 \alpha$ percent of the times you execute your program.
How can I do this? Let's say you picked $\alpha = .999$. Since I know $P(|X_n - X| \gt \epsilon) \rightarrow 0$, I know for large enough $n$, $P(|X_n - X| \gt \epsilon) \lt .001$. I will simply hand you this large number $N$. (It is necessary to remember at this point that the sequence $P(|X_n - X| \gt \epsilon)$ is a sequence of real deterministic numbers, sequenced by $n$.) When you run your program, at least 99.9% of the time, $|X_N - X| \le \epsilon$.
In fact, you can shift your focus to another value $X_M$ if $M\gt N$, and at least 99.9% of the time, $|X_N - X| \le \epsilon$ as well.
Here is an example program: