Find the nearest integer solution to a linear equation

425 Views Asked by At

I am interested in calculating the maximum number of strings of a fixed length that can be generated by a regular expression.

If I have the regular expression ((a|b)c)+

How would I find out how many strings I can build from that not exceeding a number, say 5?

In this particular case, I can't build a string that fits the regex that is exactly 5 characters, but I can find strings that are 4 characters, or 6.

I think the linear equation would be something like

((a*x + b*y)+c)*z = I

with restrictions:
c=1
x>=0, y>= 0, z >=0
if(x = 0, y > 0)
if(y = 0, x >0

If this solution is not an integer, is there any way to find the nearest integer solution, or do I just need to solve the equation for different I values?

1

There are 1 best solutions below

9
On BEST ANSWER

First make the observation that each new pair of letters allow a doubling of the number of combinations.

You can write it as a linear least squares optimization.

$$\min_v \|\bf Mv\|$$ I make $\bf M$ unnecessarily large so pattern gets easier to spot: $${\bf M} = \left[\begin{array}{rrrrrrrrrrrrrrrrrrr} 0&2&0&-1&0&0&0&0&0&0&0&0&0\\ 0&0&0&2&0&-1&0&0&0&0&0&0&0\\ 0&0&0&0&0&2&0&-1&0&0&0&0&0\\ 0&0&0&0&0&0&0&2&0&-1&0&0&0\\ 0&0&0&0&0&0&0&0&0&2&0&-1&0 \end{array}\right]$$

But we also need to enforce a boundary condition: for example that there are 2 2-letter solutions. We can do this by adding a term

$$+\epsilon {\bf C}\|{\bf v}-[0,2,0,\cdots]^T\|_2^2$$

Where $\bf C$ is the diagonal matrix 0 everywhere except but at ${\bf C}_{22} = 1$. This will force second position of $\bf v$ to be 2.

Now the actual answer would be to sum up the solution vector since we want the sum of all combinations of 1,2,3,4,5 length. But that is easily done when we have solved the system.


EDIT: Variations: if we had for example $(a|b)+$ we would have:

$${\bf M} = \left[\begin{array}{rrrrrrrrrrrrrrrrrrr} 2&-1&0&0&0&0&0\\ 0&2&-1&0&0&0&0\\ 0&0&2&-1&0&0&0\\ 0&0&0&2&-1&0&0\\ 0&0&0&0&2&-1&0 \end{array}\right]$$

Because then each new letter could take any of the 2 new values. No need to hop over every second letter with a "0" in the equation system which would have been forced to be "c" in the previous expression.

If we had $((a|b|c)d)+$ we would have:

$${\bf M} = \left[\begin{array}{rrrrrrrrrrrrrrrrrrr} 0&3&0&-1&0&0&0&0&0&0&0&0&0\\ 0&0&0&3&0&-1&0&0&0&0&0&0&0\\ 0&0&0&0&0&3&0&-1&0&0&0&0&0\\ 0&0&0&0&0&0&0&3&0&-1&0&0&0\\ 0&0&0&0&0&0&0&0&0&3&0&-1&0 \end{array}\right]$$

Because there would be 3 new cases to multiply with for each new occurrence of $((a|b|c)d)$