Isn't this the most compact binary representation of all numbers?

999 Views Asked by At

Here is the transformation: $$\begin{align*} &1\to(0)\\ &2\to(1)\\ &3\to(10)\\ &4\to((1))\\ &5\to(100)\\ &6\to(11)\\ &7\to(1000)\\ &8\to((10))\\ &9\to((1)0)\\ &10\to(101)\\ &11\to(10000)\\ &12\to(1(1))\\ &13\to(100000)\\ &14\to(1001)\\ &15\to(110)\\ &16\to(((1)))\\ &17\to(1000000)\\ &18\to((1)1)\\ &19\to(10000000)\\ \end{align*}$$ and so on ... The idea is to use $i_{th}$ position to represent the power of $i_{th}$ prime denominator. There is a recursive hierarchy whit parenthesis indicating beginning and end of each number. Since primes are building blocks of all numbers, this representation must have maximum information and minimum number of digits compared to all and every other binary representation.

  1. Can you prove me right or wrong?
  2. Is this being already used?

P.S. "Assuming that $\psi$ and $\phi$ are two uniquely decodable codes of natural numbers, $\psi$ is more compact than $\phi$ if the average number of digits that you use with $\psi$ to represent N natural numbers randomly chosen between 1 and M where (M>N) and N→∞ is less than that of $\phi$."

3

There are 3 best solutions below

0
On BEST ANSWER

Take $N$ large. Then the number of integers in $[1,N]$ which have a prime factor greater than $\sqrt N$ is asymptotic to $N \log 2$ (see http://en.wikipedia.org/wiki/Dickman_function). These numbers all require at least $\pi(\sqrt{N}) \sim 2 \sqrt{N}/\log N$ bits. Therefore the average number of bits used will be asymptotically at least as high as $(2\log 2) \sqrt{N} / \log N$. This is grossly inefficient on average, compared to the $\log N / \log 2 + O(1)$ bits that the standard binary representation uses.

Update: actually, the contribution from the primes by themselves is already large enough to upset the average. The total number of bits consumed by the $\approx N/\log N$ primes is about $\tfrac12 (N/\log N)^2$, so the average bit usage coming from just the primes is asymptotic to $N/(2\log^2 N)$. It's unrealistic to expect such a wildly irregular encoding which has exponentially-bad worst case to beat the greedy binary encoding on average.

8
On

It depends on what you mean by a binary representation of all numbers and by most compact. One interpretation is a mapping $\varphi$ from the natural numbers to binary strings such that given a binary string encoding possibly more than one number, there is at most one way of decoding it. For example, suppose we encode the numbers $0,1,2,3,\ldots$ using the binary strings $1,01,001,0001,\ldots$. Then $0101001$ can be decoded as $1,1,2$, and $010$ has no decoding at all; but no string is ambiguous (has more than one decoding). Such a mapping $\varphi$ is known as a uniquely decodable code of the natural numbers.

Defining most compact is slightly more complicated. For example, suppose that we replace $1,01,001,0001,\ldots$ by $01,1,001,0001,\ldots$; which is more compact? And among $1,01,001,0001,00001,000001,0000001,\ldots$ and $1,010,011,00100,00101,00110,00111,\ldots$, which is more compact? The former encodes $n$ using $n+1$ bits, the latter using roughly $2\log n$ bits. One possible definition is the following:

Suppose $\varphi,\psi$ are two uniquely decodable codes of the natural numbers whose codeword lengths are non-decreasing (so $|\varphi(0)| \leq |\varphi(1)| \leq |\varphi(2)| \leq \cdots$). We say that $\psi$ is more compact than $\varphi$ if $\lim_{n\to\infty} |\psi(n)|-|\varphi(n)| = -\infty$.

In other words, $\psi$ is more compact than $\varphi$ if for every $C$, large codewords of $\psi$ are shorter by at least $C$ than their counterparts in $\varphi$. For example, the code $1,010,011,\ldots$ is more compact than the code $1,01,001,\ldots$ under this definition.

Under these definitions, it is not too difficult to show that there is no most compact code (see for example here, though one can come up with simpler proofs). In other words, given a uniquely decodable code of the natural numbers $\varphi$, one can construct another uniquely decodable code of the natural numbers $\psi$ which is more compact than $\varphi$.

In fact, the situation is much worse: given any sequence $\varphi_1,\varphi_2,\ldots$ of codes, one can construct a code $\psi$ which is more compact than all of them at the same time! If you allow uncountable "sequences", then the existence of a scale of codes becomes independent of ZFC, just like the Continuum Hypothesis.

3
On

Primes are rare, but numbers with large prime factors are common. If $n$ is around $2^{100}$, it will take about 100 bits to represent it in the usual base-2 way. It will take more than 100 bits to represent it your way, if it is divisible by any prime exceeding the 100th prime, and most numbers in the neighborhood of $2^{100}$ are divisible by some prime exceeding the 100th prime. However you choose to define "on average", on average your representation loses out to ordinary base-2.