Looking for formulas/equations, by which I can replace $1.8\cdot 10^{19}$ numbers.

46 Views Asked by At

I have a series of numbers between $281474976710655$ and $18446744073709552000$. I want to write one or more formulas/equations/etc. that can generate most of the numbers in the range.

For example, I can describe some of the numbers by Factorial, some by Fibonacci, some by power, etc. I'm looking for the most dominant and efficient formulas/equations that can replace all or most of these numbers.

1

There are 1 best solutions below

0
On

Still no formula suggestions I'm afraid, but I'd like to have one more try at explaining the problem with this approach to storing the numbers more compactly. (Previous attempt here.)

I'll work through what happens with factorials, but I'll stick to small numbers to keep it manageable. You might like to try redoing the calculation for a different size of number, or a different "storage formula".

Let's just consider numbers which can be stored as a pair of characters: either two decimal digits, or one digit and the factorial sign. This means we're using $11$ symbols instead of $10$.

Using just digits, we can write the integers from $0$ to $99$, i.e. $100$ different values.

Using one digit and the factorial sign, we can also represent $5!,6!,7!,8!$ and $9!$. (The lower factorials are below $100$ so already counted.) So we can represent $105$ different values altogether.

This is inefficient though, because some values have two representations and some combinations go unused. For example, $!6$ isn't used to represent anything, and $24$ is always stored as $4!$ instead.

But with $11$ different symbols, we ought to be able to represent $11^2=121$ different values, not just $105$.

So let's just use the extra symbol as an extra digit rather than as a factorial sign, and work in base $11$. Now we're able to represent $121$ different values using two characters.

If the numbers are being stored in a computer, this is still inefficient. Distinguishing $11$ different digits needs four bits to represent each one—but that's actually enough bits to distinguish $2^4=16$ different digits, We're wasting five of the possibilities for those bits.

So let's have $16$ different digits. We may as well call them $0123456789ABCDEF$. Now we're working in hexadecimal. Two digits will fit in one byte of memory, and can store any of $16^2=256$ different values. This is more than double the $105$ that we started with, and there's no waste at all. Every possible pair of digits represents a different number. Not only that, but the format is one that computers are set up to handle, and the numbers are all in the same format, making them easier to process.

Now, thinking about the short formulas. I'll admit that they do look like a more compact way of recording the information when written on paper. $7!$ is indeed shorter than $5040$ and $2^{16}$ is shorter than $65536$. If that was the whole story, they would save space. So how come they dont?

As humans, we know hundreds of different symbols when we see them, and instantly recognise the format. For us, adding $!$ or ^ to the available characters doesn't amount to writing down any additional information. We don't have to make special arrangements to ensure that when we write a $2$, we could have written something else. The information is stored in the shape of the written symbol on the paper, with huge amounts of redundancy built in, and we can write pretty well anything. Any character from any language we know or don't know, any mathematical symbol, anything that our eyes can make out.

But in a computer it's different: each new symbol adds extra information to store. Maybe you now need to store each digit as an ASCII or Unicode character instead of as four bits, maybe most entries need marking to say "this one isn't a factorial", maybe you've got to insert a separator between entries because they're different lengths . . . And this extra information will always defeat any attempt to store $n$ bits' worth of information in less than $n$ bits.

If you find a formula that will work for all the numbers, it'll use at least as much storage as putting them all into hexadecimal would. If you find a format that's nice and short, it will only store a small subset of the numbers.

The most efficient approach is to find the most compact representation that will work for every entry, and use that.