I found some examples of infinite regular languages having non-regular subsets. Is it true in general?
Every infinite regular language has a non-regular subset?
4.6k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
Yes, it is true in general. Suppose $L$ is an infinite regular language. Since $L$ is infinite, it must have uncountably many subsets. However, there are only countably many subsets of $L$ that are regular languages. Therefore, there must exist a subset of $L$ that is not a regular language.
On
Let $L$ be any infinite language. Because the number of subsets of $L$ is uncountable, while the number of regular languages is countable, there must be subsets of $L$ that are not regular. The same reasoning shows that there are subsets of $L$ that are not context-free, that are not recursive (decidable), and that are not recursively enumerable (recognizable), since all of these families of languages are countable.
If an explicit example is required, it is straightforward to construct a subset of $L$ that violates the pumping lemma for context-free languages, and hence that is not context-free (or regular). We start with two empty sets, $A_{1}=B_{1}=\varnothing$. We then repeat the following steps for $i\in \mathbb{N}$:
- Choose $s \in L \cap B_{i}^{c}$ that is longer than the longest string in $A_{i}$.
- Let $A_{i+1}=A_{i} \cup \{s \}$.
- Let $B_{i+1}=B_{i} \cup \{ uv^2 x y^2 z : s=uvxyz \wedge |vy|\ge 1\}$.
Because $L$ is infinite and $B_{i}$ is always finite, there is always a string available to choose in step 1. The set $A_{\infty}=\bigcup_{i=1}^{\infty}A_{i}$ is a subset of $L$ that violates the pumping lemma: it contains arbitrarily long strings, none of which can be "pumped" without producing strings in the forbidden set $B_{\infty}=\bigcup_{i=1}^{\infty}B_{i}$, which is disjoint from $A_{\infty}$ by construction. We conclude that $A_{\infty} \subseteq L$ is not context-free.
Suppose $L$ is a regular language. Since $L$ is infinite, there is a sequence $(w_n)$ of words in $L$ such that for all $n$ the length of $w_{n+1}$ is greater than the square of the length of $w_n$. Let $L'=\{w_n:n\geq1\}$. The generating function of $L'$ is the series $\sum_{n\geq0}t^{\operatorname{length}(w_n)}$, and this is not a rational function. It follows that $L'$, a sublanguage of $L$, is not regular.
Non-rationality of the series can be checked by hand (the Taylor coefficients of rational function satisfy a linear recursion with constant coefficients, and mine has gaps way too large for this to work) or using some big tool like the Ostrowski-Hadamard gap theorem