Practical differences between a PRNG and a Markov chains

139 Views Asked by At

In computer programming you can easily find people describing both a PRNG, like a Mersenne Twister, and a Markov / Stochastic process as "pseudo random generators".

I honestly never liked this pseudo-definition, but I also never took that too seriously since I was only interested in PRNGs alone.

Now I would like to understand the practical difference between the 2, what looks like a relevant difference is the fact that PRNG are tipically implemented using 1 dimension only where Stochastic processes are usually represented by an N-dimensional matrix, so I assume that stochastic processes are naturally multidimensional.

1

There are 1 best solutions below

5
On BEST ANSWER

You asked for some non-wiki description of Markov chains, so here it is. I hope it will help you $\ddot\smile$

Technically:

Markov chain is a sequence of random variables $(X_i)_{i \in \mathbb{N}}$, $X_i : \Omega \to S$ such that for all $n \in \mathbb{N}$ and all $x_0 \in S, x_1 \in S,\ldots,x_n \in S$

$$ \mathbb{P}(X_n = x_n \mid X_0 = x_0, X_1 = x_1,\ldots,X_{n-1} = x_{n-1}) = \mathbb{P}(X_n = x_n \mid X_{n-1} = x_{n-1}).$$

What is a function:

Function is a mapping from a set called domain to a set called codomain with a property that every element of the domain is mapped to exactly one element of the codomain. In particular, given a function $f : \mathbb{R} \to \mathbb{R}$, it doesn't matter how many times we calculate $f(x)$ for some $x \in \mathbb{R}$, each time we will get the same result. In other words, functions are, by their definition, deterministic. For example, things like rand() or scanf(...) from programming languages are not functions according to mathematical definition.

Nondeterminism in mathematics:

This poses a problem, as to how express such things in a mathematical language. The answer is, we fake such behavior by introducing an additional, artificial dependence on "the state of the world". For example, get_int could be modeled by

$$\mathtt{get\_int} : W \to \mathbb{N} \times W,$$

where $W$ is the set of possible states of the world, and to "read" three natural numbers from the user we need to thread that state through all the calls:

\begin{align} n_1, w_1 = \mathtt{get\_int}(w_0), \\ n_2, w_2 = \mathtt{get\_int}(w_1), \\ n_3, w_3 = \mathtt{get\_int}(w_2). \end{align}

Note, that every time we would call $\mathtt{get\_int}(w_1)$ we would get the same pair $(n_2,w_2)$. However, writing things in this way is cumbersome. There are some workarounds (e.g. monads), but there is another approach. Instead of threading the world state, we could precompute all the $w_i$'s (this is theoretical, so it does not cost us anything), and let the get_int know only which one to use (the index is written in the subscript, while $\omega$ represents all the data needed to calculate $w_i$ from $i$):

\begin{align} n_1 = \mathtt{get\_int}_0(\omega), \\ n_2 = \mathtt{get\_int}_1(\omega), \\ n_3 = \mathtt{get\_int}_2(\omega). \end{align}

This gets us to randomness, as we can assume some probability distributions on the set of all $\omega$'s, which we will denote $\Omega$.

It's like there are multiple possible worlds in which everything is fixed (future, past, everything). For example, there is a world $\omega_1$ where the user or your program typed in $2, 3, 5$ and your last coin throw resulted in tails, and there is another possible world $\omega_2$ where the user typed in $2, 2, 2$, while your last coin throw ended a different way, and many, many others worlds. You don't know which world this is, but if you would, you could predict (i.e. calculate) results of all the coin throws, all the lottery number sequences, and so on.

Observe, that if you were to split the possible worlds into two sets $A$ and $B$ where the next fair coin throw result is heads and tails respectively, then you can say that the actual worlds is in set $A$ with probability $\frac{1}{2}$.

What is a random variable:

Random variable is like a guinea pig: neither a pig, nor from Guinea. Random variable is a function $X : \Omega \to S$ for some set $S$. Usually $S = \mathbb{R}$, but in case of Markov chains, this is any finite set, commonly $\{0,1,\ldots,n-1\}$ for some $n$. However, usually we hide this explicit dependence on $\omega$, and we write $X + Y$ to denote $\omega \mapsto X(\omega) + Y(\omega)$. In consequence,

$$\mathbb{P}(X = 5) \text{ means } \mathbb{P}(\{\omega \in \Omega \mid X(\omega) = 5\}),$$

that is, it is a probability that this is the world in which the random variable (i.e. function that represents it) has value $5$. Now, the conditional probability

$$\mathbb{P}(X = 5 \mid Y = 6)$$

could be rephrased into "suppose that we are in one of the worlds, where $Y$ assumes value $6$, what is the probability that in such a world $X$ has value $5$?" This is quite a broad topic, but I won't bore you with specific formulas, let's jump in into the definition of Markov chains.

Markov chains:

So, Markov chain is a sequence $(X_i)_{i \in \mathbb{N}}$ of random variables (represented as functions $\Omega \to S$), such that for all $n \in \mathbb{N}$ and all $x_0 \in S, x_1 \in S,\ldots,x_n \in S$

$$ \mathbb{P}(X_n = x_n \mid X_0 = x_0, X_1 = x_1,\ldots,X_{n-1} = x_{n-1}) = \mathbb{P}(X_n = x_n \mid X_{n-1} = x_{n-1}).$$

What does the formula mean? The left side is

the probability that $X_n$ has value $x_n$ assuming that this is one of the worlds in which $X_0 = x_0$, $X_1 = x_1, \ldots, X_{n-1} = x_{n-1}$,

while the right side is

the probability that $X_n$ has value $x_n$ assuming that this is one of the worlds in which $X_{n-1} = x_{n-1}$.

That is, the probability of $X_7 = 0$ does not depend on the value of $X_3$, only on $X_6$. This is great, because to get $X_{n}$ we don't need to calculate all the $X_0$, $X_1$ up to $X_{n-1}$, we need to keep only the last one.

An example and a non-example:

Let $\Omega$ be the set of all infinite binary sequences, $2^\mathbb{N}$ (i.e. infinite series of coin throws), each equally probable. One needs to write much more to make this formal, but let's just say that $$\mathbb{P}(\{\omega \in \Omega \mid \text{first $k$ throws are fixed to }\alpha_0,\alpha_1,\ldots,\alpha_{k-1}\}) = \frac{1}{2^k}.$$

Now define $X_0(\omega) = 0$, and $$X_i(\omega) = \begin{cases}0 & \text{if the $i$-th throw was heads},\\(X_{i-1}(\omega)+1) \bmod 3 &\text{otherwise}.\end{cases}$$

The above sequence is a Markov chain. In this case it's easy to tell, because the definition of $X_i$ does depend only on $X_{i-1}$ and an independent part of $\omega$.

However, it doesn't need to be so in general, i.e. the definition may depend on other states and the sequence still might be a Markov chain. One such example is

$$X_i(\omega) = \text{number of heads in the first $i$ coin throws} \bmod 2.$$

A non-example could be

$$X_i(\omega) = \begin{cases} 1 & \text{ if both $(i+1)$-th and $(i+2)$-th throws were heads},\\ 0 & \text{ otherwise}. \end{cases}$$

Operational intuition:

The first example can be represented by the following diagram:

                 first example

We have three states, namely $S = \{0,1,2\}$ and it works similar to an automaton, just that the next state is picked "at random" with appropriate probabilities (the sum of outgoing probabilities has to be $1$):

\begin{align} \mathbb{P}(X_i = 0 \mid X_{i-1} = 0) &= \frac{1}{2}\\ \mathbb{P}(X_i = 0 \mid X_{i-1} = 1) &= \frac{1}{2}\\ \mathbb{P}(X_i = 0 \mid X_{i-1} = 2) &= 1\\ \mathbb{P}(X_i = 1 \mid X_{i-1} = 0) &= \frac{1}{2}\\ \mathbb{P}(X_i = 1 \mid X_{i-1} = 1) &= 0\\ \mathbb{P}(X_i = 1 \mid X_{i-1} = 2) &= 0\\ \mathbb{P}(X_i = 2 \mid X_{i-1} = 0) &= 0\\ \mathbb{P}(X_i = 2 \mid X_{i-1} = 1) &= \frac{1}{2}\\ \mathbb{P}(X_i = 2 \mid X_{i-1} = 2) &= 0\\ \end{align}

while the starting state is

\begin{align} \mathbb{P}(X_0 = 0) &= 1 \\ \mathbb{P}(X_0 = 1) &= 0 \\ \mathbb{P}(X_0 = 2) &= 0 \\ \end{align}

This information can be neatly summed up in transition matrix and initial distribution vector:

$$ P = \left[\begin{array}{c}\frac{1}{2}&\frac{1}{2}&0\\\frac{1}{2}&0&\frac{1}{2}\\1&0&0\end{array}\right],\quad\quad \pi_0 = \left[\begin{array}{c}1\\0\\0\end{array}\right]. $$

and in such case

$$\left[\begin{array}{c}\mathbb{P}(X_n = 0)\\\mathbb{P}(X_n = 1)\\\mathbb{P}(X_n = 2)\end{array}\right] = P^n \pi_0.$$

In practice you don't need all the mathematical theory, and $P$ and $\pi_0$ are enough to specify a Markov chain. Often, the $P$ matrix is implicit, i.e. its entries are calculated on-the-fly, especially if the space of states is large.

Now, observe that to generate one particular run of a Markov chain, that is a sequence $x_0,x_1,x_2,\ldots \subset S$ such that $x_i = X_i(\omega)$ for some particular $\omega \in \Omega$, you need some amount of entropy each time you need to make a decision (here you need one bit each time you are in state $0$ and $1$ and no bits are required if you are in state $2$). So, in general, to generate an infinite sequence, you might need an infinite amount of entropy.

Some short code:

A short code (using ruby) to generate such a run of the considered Markov chain (using pseudorandom numbers, not real entropy) would be:

#!/usr/bin/ruby

def step(state)
  case state
  when 0
    if rand < 0.5 then 0 else 1 end
  when 1
    if rand < 0.5 then 0 else 2 end
  when 2
    0
  end
end

state = 0
loop do 
  print state, "\n"
  state = step(state)
end

A bit more interesting example:

One nice example is using Markov chains to generate sentences in English (kind of). This is not directly related to what you want, but is should give you some insights on how Markov chains work and can be used.

Suppose you have some text, then you can calculate the probability that the next word is $x_{n}$ given that the current one is $x_{n-1}$. For example "I like math and I know you like physics." would get you

\begin{align} \mathbb{P}(X_{n} = \text{«I»} \mid X_{n-1} = \text{«I»}) &= 0\\ \vdots\\ \mathbb{P}(X_{n} = \text{«math»} \mid X_{n-1} = \text{«like»}) &= 1\\ \vdots\\ \mathbb{P}(X_{n} = \text{«like»} \mid X_{n-1} = \text{«I»}) &= \frac{1}{2}\\ \vdots \end{align}

which could then be used to produce sentences like "I like physics." or "I like math and I know you like math and I like math and I like physics." and many similar.

A simple implementation of the above could look like this (the input should be one sentence per line):

#!/usr/bin/ruby

P = Hash.new
P[:begin] = []

while line = $stdin.gets do
  line.chomp!
  state = :begin
  line.scan(/\w+|\s+|./).each do |lexem|
    next if lexem =~ /\s+/
    P[state] << lexem
    state = lexem
    P[state] ||= []
  end
  P[state] << :end
end

def step(state)
  tab = P[state]
  if tab == nil or tab.size == 0 then 
    :end
  else 
    tab[rand(tab.size)] 
  end
end

state = :begin
loop do 
  state = step(state)
  break if state == :end
  print state, " "
end
print "\n"

To run it use cat one_sentence_per_line_file.txt | ruby script.rb
or ruby script.rb < one_sentence_per_line_file.txt.

Conclusion:

This post came out long (too long) and it only scratched the surface. Markov chains are great, but this topic is too broad for such a post. I hope the above will be enough to get you started $\ddot\smile$