Find the number of accepted matches for a DFA defined by a regular expression

225 Views Asked by At

I was interested in finding out the number of strings of length = MAX that would match a general regular expression.

This is following on from this question Find the nearest integer solution to a linear equation

Where the regular expression was fairly simple ((a|b)c)+

In that case, no solution exists for a matching a string of length 5, but does for a string of length 4 or 6.

How can I generalize the approach in the answer to that question to any regular expression?

1

There are 1 best solutions below

2
On BEST ANSWER

Recall that any DFA can be represented as a graph with $n$ nodes, edges labeled by characters, a designated start node, and a set of designated "accept" nodes. Construct an $n\times n$ matrix $M$, where $M_{ij}$ is the number of distinct characters that take the DFA from node $i$ to node $j$. Then the number of accepted strings of length $m$ is given by the matrix multiplication $s^{T}\cdot M^m \cdot t$, where $s_i$ is $1$ if $i$ is the start node (and $0$ otherwise), and $t_i$ is $1$ is $i$ is an "accept" node (and $0$ otherwise). Find the Jordan normal form of $M$; i.e., write $M=P^{-1}JP$, where $P$ is invertible and $J$ is almost-diagonal. Then $M^m=P^{-1}J^m P$, and the number you seek is given by $s^{T}\cdot P^{-1}\cdot J^m \cdot P \cdot t$. This can be written explicitly as a polynomial in the eigenvalues of $J$ (with terms of the form $\lambda^m$ and possibly $\lambda^{m-1}$).