set of all reg exp vs set of all languages

4.3k Views Asked by At

$\Sigma$ = {a,b,c,d,e} V = {A,B,C,D,E,F,G,H}

According to the notes:

The set of all regular expressions over $\Sigma$ is infinite and countable.

The set of all languages over $\Sigma$ is infinite and uncountable.

I don't get why reg exp is countable whereas language is uncountable. I guess i can't seem to be grasping the subtle difference between them. What's the difference?

1

There are 1 best solutions below

4
On BEST ANSWER

A language is just a set of strings. A language $L$ over $\Sigma$ is a subset of $\Sigma^\ast$, which is the set of all strings made from letters in $\Sigma$. $L$ contains some of the strings in $\Sigma^\ast$, but not necessarily all; it might contain, or not contain, any string of letters from $\Sigma$. $\Sigma^\ast$ itself is a countable set, and a countable set has an uncountable number of possible subsets, by Cantor's theorem. So the family of all possible languages over $\Sigma$ is uncountable.

However, a regular expression is an expression involving the operators $\cup,\cdot,\ast$ that represents a certain language. For example, the regular expression $(\mathtt{0}^\ast\cup\mathtt{11^\ast01})\mathtt{01}$ represents the language $\{ \mathtt{01}, \mathtt{001}, \mathtt{0001}, \mathtt{00001}, \mathtt{10101}, \mathtt{000001}, \mathtt{110101}, \mathtt{0000001}, \mathtt{1110101}, \ldots \}$.

The language that a regular expression represents is always a regular language. And a fairly simple argument shows that the collection of all regular expressions over $\Sigma$ is countable: every regular expression is finite, so has some length $\ell$. Let $R_\ell$ be the set of regular expressions of length $\ell$. Every set $R_\ell$ is finite. (It has no more than $N^\ell$ elements, where $N$ is the size of $\Sigma$ plus the other symbols $\cup,\cdot,\ast, ), (,$ etc.) The set of all regular expressions is just the union of all the $R_\ell$, and is therefore a countable union of finite sets and so countable.

(This provides one possible proof that there are languages that are not represented by regular expressions, and which are therefore not regular—indeed that most languages are not regular. One simple example is the language of all strings of $\mathtt{x}$es that have length $p^2$ for some integer $p$.)