Number of sequences from one state to another with fixed transition counts / empirical matrix

44 Views Asked by At

I think the answer to this might be well known but I can't find it and I am also bad at combinatorics.

First, note that if we have an i.i.d. random variable with finite sample space $A$ we can obtain the empirical distribution or empirical vector $c \in \mathbb{N}^A$ (also known as the type) of a sequence $a=(a_0,...,a_n)$ of samples by counting the number of occurrences of each element of $A$ in the sequence: $$c_i(a):=\sum_{t=0}^n \delta_i(a_t)$$ where $$\delta_i(x):=\begin{cases} 1 &\text{ if } x=i,\\ 0 &\text{else.}\end{cases} $$ Conversely given an empirical vector $c$ the number of sample sequences that produce it is equal to the multinomial coefficient $$\frac{(n+1)!}{\prod_{i \in \{1,...,|A|\}} c_i!}.$$

I am looking for how to calculate the corresponding number in the case of a given empirical matrix. The empirical matrix counts the occurring transitions (pairs of successive elements) in a sequence. So how many sequences are there for a given count of the occurring transitions in the sequence?

More specifically I want to know how many such sequences there are that end in a particular element of $A$. This will probably follow from the above.

More formally:

Let $A$ be a finite set and consider finite sequences $a=(a_0,a_1,...,a_n) \in A^{n+1}$ starting in $a_0$ and ending in $a_n$. Also, let $C:A^{n+1} \rightarrow \mathbb{N}$ be a matrix valued function counting the number of transitions occurring in a given sequence $a \in A^{n+1}$ so that:

$$C_{ji}(a):=\sum_{t =1}^n \delta_j(a_t) \delta_i(a_{t-1})$$

For a fixed $|A| \times |A|$ matrix $B$ of transition counts (i.e. non-negative integers that sum to the total number $n$ of transitions in sequences of length $n+1$ i.e. $\sum_{i,j} B_{ji}=n$) and fixed initial and final elements $b_0,b_n \in A$ is there an expression for the number of sequences $a \in A^{n+1}$ such that

  • $C(a)=B$ and
  • $a_0= b_0$ and $a_n = b_n$?

I first thought I could just use the multinomial coefficient but then realized I am overcounting because not every permutation of the transitions actually produces a valid trajectory.