Show a set of numbers, binary representation of which is a regular language, but the ternary representation is not.

1.6k Views Asked by At

If $A$ is a set of natural numbers and $k$ is a natural number greater than 1. We define: $B_k(A) = \{w|w $ is the representation of $x$ in base $k$ for $x\in A\}$.

Note that we do not allow leading $0$s. Find such a set $A$ that $B_2(A)$ is a regular language but $B_3(A)$ is not.

My insights: I wanted to try the usual approach using the pumping lemma. The problem is, the numbers that we choose to put into the set, should have a pattern in base $3$ that we can use when applying the pumping lemma and they also should have a pattern in base $2$ so that we know $B_2(A)$ is regular. For example I tried multiples of $6$ but I found no patterns.

2

There are 2 best solutions below

3
On BEST ANSWER

$\def\<#1>{[#1]_3}$Let $\<z>$ be the number represented by the base-3 numeral $z$. Since the set of numbers $A = \{2^n - 1 \mid n \in \mathbb{N}\}$ is represented in binary by the regular expression $1^*$, if we prove that

$$L = B_3(A) = \{ z \in \{0,1,2\}^* \mid \exists n \in \mathbb{N} \,.\,\<z> = 2^n-1\} \cap \{1,2\}\cdot \{0,1,2\}^*$$

is not regular we are done.

We are going to use the pumping lemma for regular languages. Suppose $L$ is regular and $z$ is a word in $L$ longer than the pumping length $p$ of $L$. As such, it can be written as $uvw$ so that $|uv| \leq p$, $|v| \geq 1$, and for $i \in \mathbb{N}$, $uv^iw \in L$.

We write:

$$\<uvw> = \<u> 3^{|w|}3^{|v|} + \<v>3^{|w|} + \<w> \enspace,$$

and also:

$$\<uv^iw> = \<u> 3^{|w|}3^{i|v|} + \<v>3^{|w|}\Big(\sum_{0 \leq j < i} 3^{j|v|}\Big) + \<w> \enspace.$$

Subtraction and some algebra yield, for $i \geq 1$,

$$\<uv^iw> - \<uvw> = 3^{|w|+|v|} \,\frac{3^{(i-1)|v|} - 1}{3^{|v|}-1}\Big(\<u>(3^{|v|}-1) + \<v>\Big) \enspace.$$

Suppose $\<z> = 2^n-1$; for $m > n$, $2^m - 1 - (2^n - 1) = 2^n(2^{m-n} -1)$. Therefore all the differences $\<uv^iw> - \<uvw>$ for $i \geq 1$ must be multiples of $2^n = \<z> + 1$.

However, when $\frac{3^{(i-1)|v|} - 1}{3^{|v|}-1}$ is odd (which happens for alternating values of $i$ because it is the summation of terms that are all odd) the largest power of $2$ that is a factor of $\<uv^iw> - \<uvw>$ is bounded by the value of

$$ \<u> (3^{|v|} -1) + \<v> \enspace, $$

which is less than $3^p$. On the other hand, $z$ has at least $p+1$ ternary digits and no leading zeros, which means that $\<z>+1$ is greater than $3^p$. We have reached a contradiction and we have to abandon the assumption that $L$ is regular.

0
On

Here is an argument that doesn't use the pumping lemma in a conventional way. Instead it draws on a simpler version of the ideas surrounding Parikh's theorem. Here's a lemma:

Let $L$ be a regular language and let $S = \{ |w| : w \in L\}$ be the set of possible lengths of words belonging to $L$. Then there exists positive integers $m$, $n$ and a (possibly empty) set $A$ of residues modulo $n$ such that, for all $a > m$, $a \in S$ iff the remainder of $a$ modulo $n$ belongs to $A$.

To see this, note that we can replace every letter in the alphabet with '$a$' without affecting the word lengths, so it suffices to look at an arbitrary DFA for a language over the singleton alphabet $\{a\}$. Then every state has exactly one out-arrow, so (assuming the DFA has the minimal number of states) it consists of a directed path that leads into a cycle (this will be familiar to anyone who's studied Pollard's $\rho$-algorithm or Floyd's algorithm for cycle detection).

The parameter $m$ may be taken as the length of the initial path segment, and $n$ as the cycle length. The set $A$ is naturally determined by which states in the cycle are accepting states.

Aside: In fact this even holds more generally for context-free languages (it's a standard exercise to show that any CFL on a singleton alphabet is also regular).

From the lemma, we can see that $S$ always has a natural density $\lim_{N\to \infty} \frac1N | S \cap [1,N] |$, in fact it is equal to $|A|/n$. In other words, there is a rational number $q$ such that the asymptotic proportion of integers belonging to $S$ is $q$.

Punchline: Now with all that machinery aside (which is not specific to this problem), we have a very easy way to see that $\{4^n - 1\}$ is not regular in base $3$ (sorry, just realized I had to adjust the example slightly). That's because the length of $4^n - 1$ in base $3$ is asymptotically $n (\log 4) / (\log 3)$, so the set of lengths will have natural density equal to $(\log 4) / (\log 3)$. This number is irrational (by unique factorization), which contradicts the above observation. It took a lot of effort to get this far but the final conclusion is very light on calculation and makes for a pretty simple "big picture" summary.