Find regular expression that defines a given language

Question

Find regular expression that defines a given language

125 Views Asked by Bumbble Comm At 26 Mar 2026 - 10:12

I want to find a regular expression that defines a language L over the alphabet {0,1} with a following condition: every word contains exactly two 000 substrings. For example, a valid word would be 1010001010001111, but not 101000011 (even though there are two 000 substrings - 0000 and 0000).

Now, a language that has exactly one 000 substring over the {0,1} alphabet is, if I'm not mistaken: (1 + 01 + 001)* 000 (1 + 10 + 100)* (which equals some X, for example).

Would the solution to the first problem be X 1 X? Or am I missing something? Any constructive input would be greatly appreciated.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2015-05-28 16:07:21

First I try a definition of what it means for a word to have $k$ substrings $s$:

A word $w$ contains a substring $s$ if $w=usv$ for some words $u$ and $v$ otherwise it contains no ($0$) substrings $s$.

If $uv$ does not contain a substring $s$ then $usv$ contains $1$ substring $s$.

If $uv$ contains $k$ substrings $s$ then $usv$ contains $k+1$ substrings $s$.

Then I try to specify the language with no substring $s=0^3$: $$ L_0 = (1|01|001)^*(\epsilon|0|00)(1|10|100)^* $$

Now I try to add the substring at all places of $L_0$ and come up with a combined expression:

$$ L_1 = (1|01|001)^*(0^3|0^4|0^5)(1|10|100)^* $$ contains all words which contain $1$ substring $0^3$.

Finally I try to add another $s$ at every place: \begin{align} L_2 = & ((1|01|001)^*(0^6|0^7|0^8)(1|10|100)^*)|\\ & ((1|01|001)^*(0^3|0^4|0^5)(1|10|100)^*1(0^3|0^4|0^5)(1|10|100)^*) \end{align} contains all words which contain $2$ substrings $0^3$.

Testing:

I wrote some Ruby code (link) to test the above regular expressions.

It generates random strings and checks if the above regular expressions give the same results as some alternative test methods.

$ ruby 2sub.rb
test (l0, m0) with 100000 samples up to length 50 ..
done.
test (l1, m1) with 100000 samples up to length 50 ..
done.
test (l2, m2) with 100000 samples up to length 50 ..
done.

This is not a proof, but gives some confidence that the results hold.

Try it. If you change tiny bits of the regexps you will notice the differences:

(..)
test (l2, m2) with 100000 samples up to length 50 ..
(..)
99944 [000001100001] [false true]!
99946 [011111011001100111110001000001] [false true]!
99947 [00010010010000010010100101] [false true]!
99950 [00101100011010100110101001101100010] [false true]!
99952 [0101100010100101101110001001011100] [false true]!
(..)

Find regular expression that defines a given language

There are 1 best solutions below

Related Questions in COMPUTER-SCIENCE

Related Questions in REGULAR-EXPRESSIONS

Trending Questions

Popular # Hahtags

Popular Questions