Expected number of unique states visited in Markov process

560 Views Asked by At

Is there a way to calculate the expected number of unique states visited in a Markov chain after n steps?

For example, say I have 4 states {A,B,C,D} with transition probabilities 1/4 across the board; i.e., the transition matrix is uniformly 1/4. The initial condition is {0,1,0,0}. What is the expected number of states visited after 3 steps?

This is a simplified example. The problem I'm working on is much more complicated (many more states and steps, and heterogeneity in transition probabilities). So I'm looking for a general solution.

I thought that I could do something like this:

E[unique states visited] = P(visited A) + P(visited B) + P(visited C) + P(visited D)

= 1-P(haven't visited A) + 1-P(haven't visited B) + 1-P(haven't visited C) + 1-P(haven't visited D)

= 1-(1-P(A,step 1))(1-P(A,step 2))(1-P(A,step 3)) + ...

But this gives me the wrong answer - I'm guessing because P(A,step 1) is not independent of P(A,step 2), etc.

3

There are 3 best solutions below

0
On

For this problem in particular, at least, we can find the solution.

Let $p(k)$ be the probability that you transition to a state you haven't visited yet, given that there are $k$ states left that you haven't visited yet.

In the case of this problem, $p(k) = \frac{k}{4}$.


For the sake of choosing a convention, I'll suppose that you are considered to have already visited the state you start on.

Let $E(k, t)$ be the expected number of states you will have visited given that there are $k$ states that you haven't visited yet, and you have $t$ transitions remaining. The base case is that $E(k, 0) = 4-k$.

Now, let's consider the recursive case. If your current number of unvisited states is $k$, you will with probability $\frac{k}{4}$ transition to a new state, so that $k\mapsto k-1$. Otherwise, with probability $1-\frac{k}{4}$, you will visit a state that you have already visited.

So $E(k, t) = \frac{k}{4} E(k-1, t-1) + \frac{4-k}{4} E(k, t-1).\qquad(t>0)$

If you solve this recurrence relation for $k=3$ initial unvisited states and $t=3$ steps, you get $175/64 \approx 2.73$ as the expected number of states visited. If you consider the intial state to be unvisited as well ($k=4$), the number drops to $148/64 \approx 2.3125$.


For Markov models with more complicated probabilities, I can only think of a brute-force solution: starting from a Markov chain with $n$ states, create a new Markov chain where each new state represents a subset of states that you've visited so far, along with the state you're currently on. You can compute the transition probabilities of this new chain from the probabilities of the old chain; then you can compute the expected number of visited states in the old chain from the expected final state of this new chain.

0
On

Let us consider some special cases where it is easy to determine the correct value:

  1. Atomic starting distribution (one 1 and the rest 0).
  2. After 0 steps the expected number of states visited should always be 1.
  3. After 1 step the expected number of places visited should be 1 + 1-(chance to stay).

Assuming $\bf v$ is our initial state vector $\bf P$ is our transition probability matrix, $\bf 1$ is the column vector of ones, the product sign is assumed to mean Schur/Hadamard (element-wise) product. A method which fulfills these is to take $${\bf 1}^T \left( {\bf 1}- \prod_{i=0}^n\left({\bf 1}-{\bf P}^i{\bf v}\right) \right)$$

This is a matrix way to write sum over vector, since scalar product with the 1 vector is a sum. I think this does something similar to what you meant in the question.

For our example:

$${\bf P} = \left[\begin{array}{cccc}0.25&0.25&0.25&0.25\\0.25&0.25&0.25&0.25\\0.25&0.25&0.25&0.25\\0.25&0.25&0.25&0.25\end{array}\right], {\bf 1} = \left[\begin{array}{c}1\\1\\1\\1\end{array}\right], {\bf v} = \left[\begin{array}{c}0\\1\\0\\0\end{array}\right]$$

  • After 0 ($n=0$): $1.00$ - we always start somewhere.
  • After 1 ($n=1$): $1.75$ - the $0.25$ missing to reach $2.00$ is because chance to stay is $25\%$
  • After 3 ($n=3$): $2.7344 \approx 2.73$ as the other answer by @user326210 .
  • After 16 ($n=16$): $3.9699<4$ - reasonable since we can't visit more states than there exist!

The first two and the last make sure it passes our sanity check and the third conforms with previous more theory focused answer by @user326210 .

This is of course by no means a proof, more of a practical contribution.

0
On

Given a Markov chain with transition matrix ${\bf{T}}$, what is the expected number of states visited by a random walk during $m$ steps starting in state $s$?

Say

$$X_i= \begin{cases} 1, & \text{if the random walk touches state } i \\ 0, & \text{otherwise} \end{cases} $$

The quantity we are looking for is the expectation

\begin{equation} \left<\sum_{i=1}^n X_i\right> = \sum_{i=1}^n\left< X_i\right> \tag{1} \end{equation}

where $n$ is the number of states. Clearly

$$ \left< X_i\right> = \text{probability that the random walk touches state }i $$

We can compute that probability as follows: Define a modified transition matrix ${\bf{U}}$, which is equal to ${\bf{T}}$, except that state $i$ only transitions into itself, namely

$$ U_{jk}= \begin{cases} \delta_{j,i}, & \text{if } k=i \\ T_{jk}, & \text{otherwise} \end{cases} $$

Now all the paths that touch state $i$ get absorbed there. Starting in state $s$, compute the occupancy after $m$ steps of this modified chain:

$$ \vec a = {\bf{U}}^m \vec x $$

where

$$ x_j = \delta_{s,j} $$

Then the answer is the final occupancy of state $i$

$$ \left< X_i\right> = a_i $$

Do this for all $i$ and compute the sum in Eqn (1).

Notes:

  1. Obviously $\left< X_s\right> = 1$, so you can save yourself $1/n$ of the work.

  2. If you want the average result starting from an arbitrary starting state, then change the starting vector $\vec x$ to $x_j=1/n$ for all $j$.