Regular expression to generate language of natural numbers

1.3k Views Asked by At

Given the alphabet $\Sigma = \text{{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}}$ I'm trying to find a regular expression that generates $L =\mathbb{N}$.

I can't remember the actual page I found this problem on, but their solution looks like this :

$$0 + \text{{1, 2, 3, 4, 5, 6, 7, 8, 9}}[0...9]^{*}$$

I'm not sure if I understand it correctly, so I'm looking for a confirmation or some sort of explanation.

As I see it, $1$ particular value at a time is taken from $\text{{1, 2, 3, 4, 5, 6, 7, 8, 9}}$ then any and each combination of digits from the interval $[0....9]$ is appended to it, then a $0$ may be appended, as to obtain the numbers that end in $0$.

I'm not sure of this reasoning and also I'm dubious about the fact that $[0...9]^*$ can generate sequences of identical digits preceded/followed by other digits, for instance the combination $1200009$.

Can anyone help alleviate my confusion ?

2

There are 2 best solutions below

4
On

What you are asking for is a regular expression that defines a unique decimal expansion for each natural number.

Let's use the POSIX standard for regular expressions. In that notation, the expression in your question is $0|([1\mbox{-}9][0\mbox{-}9]*)$ and it works likes this: it singles out zero, i.e., $0$, as a special case, and it then captures any non-zero number as a non-zero digit ($[1\mbox{-}9]$) followed by an arbitrary sequence of digits ($[0\mbox{-}9]*$).

The POSIX notation uses $|$ for your $+$ and uses square brackets as a short-hand for sets of symbols. So $0|([1\mbox{-}9][0\mbox{-}9]*)$ is the regular expression denoting the language that you might describe in a more maths-like notation as :

$$\{0\} \cup \{1,2,3,4,5,6,7,8,9\} \times \{0,1,2,3,4,5,6,7,8,9\}^*$$

4
On

You should really check the notation which is not the standard computer sciences notation. From what I understand :

  • $+$ is a disjunction operator with low priority. Hence, a matching string must match either 0 or {1,2,3,4,5,6,7,8,9}[0...9]*.
  • {1,2,3,4,5,6,7,8,9} is matched by any symbol listed in the set, i.e. any decimal digit that is not $0$.
  • [0...9] is matched by any digits between 0 and 9 i.e. any decimal digit.
  • * means the previous pattern can be repeated arbitrarily following itself, including to have 0 successions of the pattern at all (that is, the empty string matches). Thus [0...9]* would mean an arbitrary sequence of decimal figures, including no figures at all.

To sum up:

  • 0 matches
  • An arbitrary lengthy sequence of decimal digits matches, upon the condition that the firtst decimal digit exists and is not 0.

I'd like to point out that you're generating the decimal notation of numbers of $\mathbb N$ without needless leading 0. The simple [0..9][0...9]* also generates notations for $\mathbb N$, although each number has in it infinitely many different notations by having arbitrarily many pointless leading zeroes.