In computer programming you can easily find people describing both a PRNG, like a Mersenne Twister, and a Markov / Stochastic process as "pseudo random generators".
I honestly never liked this pseudo-definition, but I also never took that too seriously since I was only interested in PRNGs alone.
Now I would like to understand the practical difference between the 2, what looks like a relevant difference is the fact that PRNG are tipically implemented using 1 dimension only where Stochastic processes are usually represented by an N-dimensional matrix, so I assume that stochastic processes are naturally multidimensional.
You asked for some non-wiki description of Markov chains, so here it is. I hope it will help you $\ddot\smile$
Technically:
Markov chain is a sequence of random variables $(X_i)_{i \in \mathbb{N}}$, $X_i : \Omega \to S$ such that for all $n \in \mathbb{N}$ and all $x_0 \in S, x_1 \in S,\ldots,x_n \in S$
$$ \mathbb{P}(X_n = x_n \mid X_0 = x_0, X_1 = x_1,\ldots,X_{n-1} = x_{n-1}) = \mathbb{P}(X_n = x_n \mid X_{n-1} = x_{n-1}).$$
What is a function:
Function is a mapping from a set called domain to a set called codomain with a property that every element of the domain is mapped to exactly one element of the codomain. In particular, given a function $f : \mathbb{R} \to \mathbb{R}$, it doesn't matter how many times we calculate $f(x)$ for some $x \in \mathbb{R}$, each time we will get the same result. In other words, functions are, by their definition, deterministic. For example, things like
rand()orscanf(...)from programming languages are not functions according to mathematical definition.Nondeterminism in mathematics:
This poses a problem, as to how express such things in a mathematical language. The answer is, we fake such behavior by introducing an additional, artificial dependence on "the state of the world". For example,
get_intcould be modeled by$$\mathtt{get\_int} : W \to \mathbb{N} \times W,$$
where $W$ is the set of possible states of the world, and to "read" three natural numbers from the user we need to thread that state through all the calls:
\begin{align} n_1, w_1 = \mathtt{get\_int}(w_0), \\ n_2, w_2 = \mathtt{get\_int}(w_1), \\ n_3, w_3 = \mathtt{get\_int}(w_2). \end{align}
Note, that every time we would call $\mathtt{get\_int}(w_1)$ we would get the same pair $(n_2,w_2)$. However, writing things in this way is cumbersome. There are some workarounds (e.g. monads), but there is another approach. Instead of threading the world state, we could precompute all the $w_i$'s (this is theoretical, so it does not cost us anything), and let the
get_intknow only which one to use (the index is written in the subscript, while $\omega$ represents all the data needed to calculate $w_i$ from $i$):\begin{align} n_1 = \mathtt{get\_int}_0(\omega), \\ n_2 = \mathtt{get\_int}_1(\omega), \\ n_3 = \mathtt{get\_int}_2(\omega). \end{align}
This gets us to randomness, as we can assume some probability distributions on the set of all $\omega$'s, which we will denote $\Omega$.
It's like there are multiple possible worlds in which everything is fixed (future, past, everything). For example, there is a world $\omega_1$ where the user or your program typed in $2, 3, 5$ and your last coin throw resulted in tails, and there is another possible world $\omega_2$ where the user typed in $2, 2, 2$, while your last coin throw ended a different way, and many, many others worlds. You don't know which world this is, but if you would, you could predict (i.e. calculate) results of all the coin throws, all the lottery number sequences, and so on.
Observe, that if you were to split the possible worlds into two sets $A$ and $B$ where the next fair coin throw result is heads and tails respectively, then you can say that the actual worlds is in set $A$ with probability $\frac{1}{2}$.
What is a random variable:
Random variable is like a guinea pig: neither a pig, nor from Guinea. Random variable is a function $X : \Omega \to S$ for some set $S$. Usually $S = \mathbb{R}$, but in case of Markov chains, this is any finite set, commonly $\{0,1,\ldots,n-1\}$ for some $n$. However, usually we hide this explicit dependence on $\omega$, and we write $X + Y$ to denote $\omega \mapsto X(\omega) + Y(\omega)$. In consequence,
$$\mathbb{P}(X = 5) \text{ means } \mathbb{P}(\{\omega \in \Omega \mid X(\omega) = 5\}),$$
that is, it is a probability that this is the world in which the random variable (i.e. function that represents it) has value $5$. Now, the conditional probability
$$\mathbb{P}(X = 5 \mid Y = 6)$$
could be rephrased into "suppose that we are in one of the worlds, where $Y$ assumes value $6$, what is the probability that in such a world $X$ has value $5$?" This is quite a broad topic, but I won't bore you with specific formulas, let's jump in into the definition of Markov chains.
Markov chains:
So, Markov chain is a sequence $(X_i)_{i \in \mathbb{N}}$ of random variables (represented as functions $\Omega \to S$), such that for all $n \in \mathbb{N}$ and all $x_0 \in S, x_1 \in S,\ldots,x_n \in S$
$$ \mathbb{P}(X_n = x_n \mid X_0 = x_0, X_1 = x_1,\ldots,X_{n-1} = x_{n-1}) = \mathbb{P}(X_n = x_n \mid X_{n-1} = x_{n-1}).$$
What does the formula mean? The left side is
while the right side is
That is, the probability of $X_7 = 0$ does not depend on the value of $X_3$, only on $X_6$. This is great, because to get $X_{n}$ we don't need to calculate all the $X_0$, $X_1$ up to $X_{n-1}$, we need to keep only the last one.
An example and a non-example:
Let $\Omega$ be the set of all infinite binary sequences, $2^\mathbb{N}$ (i.e. infinite series of coin throws), each equally probable. One needs to write much more to make this formal, but let's just say that $$\mathbb{P}(\{\omega \in \Omega \mid \text{first $k$ throws are fixed to }\alpha_0,\alpha_1,\ldots,\alpha_{k-1}\}) = \frac{1}{2^k}.$$
Now define $X_0(\omega) = 0$, and $$X_i(\omega) = \begin{cases}0 & \text{if the $i$-th throw was heads},\\(X_{i-1}(\omega)+1) \bmod 3 &\text{otherwise}.\end{cases}$$
The above sequence is a Markov chain. In this case it's easy to tell, because the definition of $X_i$ does depend only on $X_{i-1}$ and an independent part of $\omega$.
However, it doesn't need to be so in general, i.e. the definition may depend on other states and the sequence still might be a Markov chain. One such example is
$$X_i(\omega) = \text{number of heads in the first $i$ coin throws} \bmod 2.$$
A non-example could be
$$X_i(\omega) = \begin{cases} 1 & \text{ if both $(i+1)$-th and $(i+2)$-th throws were heads},\\ 0 & \text{ otherwise}. \end{cases}$$
Operational intuition:
The first example can be represented by the following diagram:
We have three states, namely $S = \{0,1,2\}$ and it works similar to an automaton, just that the next state is picked "at random" with appropriate probabilities (the sum of outgoing probabilities has to be $1$):
\begin{align} \mathbb{P}(X_i = 0 \mid X_{i-1} = 0) &= \frac{1}{2}\\ \mathbb{P}(X_i = 0 \mid X_{i-1} = 1) &= \frac{1}{2}\\ \mathbb{P}(X_i = 0 \mid X_{i-1} = 2) &= 1\\ \mathbb{P}(X_i = 1 \mid X_{i-1} = 0) &= \frac{1}{2}\\ \mathbb{P}(X_i = 1 \mid X_{i-1} = 1) &= 0\\ \mathbb{P}(X_i = 1 \mid X_{i-1} = 2) &= 0\\ \mathbb{P}(X_i = 2 \mid X_{i-1} = 0) &= 0\\ \mathbb{P}(X_i = 2 \mid X_{i-1} = 1) &= \frac{1}{2}\\ \mathbb{P}(X_i = 2 \mid X_{i-1} = 2) &= 0\\ \end{align}
while the starting state is
\begin{align} \mathbb{P}(X_0 = 0) &= 1 \\ \mathbb{P}(X_0 = 1) &= 0 \\ \mathbb{P}(X_0 = 2) &= 0 \\ \end{align}
This information can be neatly summed up in transition matrix and initial distribution vector:
$$ P = \left[\begin{array}{c}\frac{1}{2}&\frac{1}{2}&0\\\frac{1}{2}&0&\frac{1}{2}\\1&0&0\end{array}\right],\quad\quad \pi_0 = \left[\begin{array}{c}1\\0\\0\end{array}\right]. $$
and in such case
$$\left[\begin{array}{c}\mathbb{P}(X_n = 0)\\\mathbb{P}(X_n = 1)\\\mathbb{P}(X_n = 2)\end{array}\right] = P^n \pi_0.$$
In practice you don't need all the mathematical theory, and $P$ and $\pi_0$ are enough to specify a Markov chain. Often, the $P$ matrix is implicit, i.e. its entries are calculated on-the-fly, especially if the space of states is large.
Now, observe that to generate one particular run of a Markov chain, that is a sequence $x_0,x_1,x_2,\ldots \subset S$ such that $x_i = X_i(\omega)$ for some particular $\omega \in \Omega$, you need some amount of entropy each time you need to make a decision (here you need one bit each time you are in state $0$ and $1$ and no bits are required if you are in state $2$). So, in general, to generate an infinite sequence, you might need an infinite amount of entropy.
Some short code:
A short code (using ruby) to generate such a run of the considered Markov chain (using pseudorandom numbers, not real entropy) would be:
A bit more interesting example:
One nice example is using Markov chains to generate sentences in English (kind of). This is not directly related to what you want, but is should give you some insights on how Markov chains work and can be used.
Suppose you have some text, then you can calculate the probability that the next word is $x_{n}$ given that the current one is $x_{n-1}$. For example "I like math and I know you like physics." would get you
\begin{align} \mathbb{P}(X_{n} = \text{«I»} \mid X_{n-1} = \text{«I»}) &= 0\\ \vdots\\ \mathbb{P}(X_{n} = \text{«math»} \mid X_{n-1} = \text{«like»}) &= 1\\ \vdots\\ \mathbb{P}(X_{n} = \text{«like»} \mid X_{n-1} = \text{«I»}) &= \frac{1}{2}\\ \vdots \end{align}
which could then be used to produce sentences like "I like physics." or "I like math and I know you like math and I like math and I like physics." and many similar.
A simple implementation of the above could look like this (the input should be one sentence per line):
To run it use
cat one_sentence_per_line_file.txt | ruby script.rbor
ruby script.rb < one_sentence_per_line_file.txt.Conclusion:
This post came out long (too long) and it only scratched the surface. Markov chains are great, but this topic is too broad for such a post. I hope the above will be enough to get you started $\ddot\smile$