Mapping a vector to a unique number using statistical coefficients and vice versa

47 Views Asked by At

Working with sequences of various dimensions of random integers, I saw that Wolfram Alpha provides information on a number of statistical indicators for this sequence.

enter image description here

Among these indicators, for example: mean, median, coefficient of variation, skewness and kurtosis etc.

I became interested in the following: is it possible with the help of one or a group of these coefficients to characterize an arbitrary sequence with the help of a unique number inherent only in this sequence. And, most importantly, is it possible, using this number r numbers, to restore this sequence exactly.

In this sense, the sequence (for example) (1,2,3) and (1,3,2) should be perceived as different. But, as Wolfram Alpha shows, their statistics are absolutely identical. Therefore, it is impossible to see the difference in sequences from them.

At the same time, the statistics for the cumulative sums of these sequences will be different, but I do not know if it is possible to restore the original sequences from them?

My attempts: I tried to use hash functions, but I found their apparatus difficult for this task. In addition, they have the effect of collisions, when a number of source sequences can correspond to one hash sum. I also tried to use Godel numbers, which allow you to associate a mathematical function or object with a certain number. But for long sequences and, as a consequence, large primes, the resulting number becomes catastrophically huge. Due to the limitations of some computing systems, the least significant digits of the number are lost, they are difficult to visualize and, as a result, it is extremely difficult to restore the original sequence.

It seems to me that this is not the easiest problem, and it would be interesting to discuss it with specialists in statistics, number theory. Even if there is a rigorous theory behind this problem (which I don't know yet), it would be interesting to immediately test it with the tools of Mathematica.

1

There are 1 best solutions below

2
On BEST ANSWER

I see two possible questions here:

  1. Explicitly asked:

I became interested in the following: is it possible with the help of one or a group of these coefficients to characterize an arbitrary sequence with the help of a unique number inherent only in this sequence. And, most importantly, is it possible, using this number r numbers, to restore this sequence exactly.

The answer is no.

  1. Implicitly:

It seems to me that this is not the easiest problem, and it would be interesting to discuss it with specialists in statistics, number theory. Even if there is a rigorous theory behind this problem (which I don't know yet), it would be interesting to immediately test it with the tools of Mathematica.

I opine that the subject area is Information theory.

The landmark event establishing the discipline of information theory and bringing it to immediate worldwide attention was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication" in the Bell System Technical Journal in July and October 1948. Citation: C. E. Shannon, "A mathematical theory of communication," in The Bell System Technical Journal, vol. 27, no. 3, pp. 379-423, July 1948, doi: 10.1002/j.1538-7305.1948.tb01338.x. (download).

Finally, if an entity has maximum entropy than the entity is its own shortest representation.

For curiosity value only:

sample = RandomInteger[{-2^32, 2^63 - 1}, 17]; 
code = Characters[
  "ABCDEFGHIJKLMNOPQRSTUVWXYZ"<>
  "abcdefghijklmnopqrstuvwxyz"<>
  "0123456789+/"]; 
goedelized = FromDigits[Flatten[(Position[code, #1] & ) /@ 
       Characters[Compress[sample]][[3 ;; -3]]] - 1, 64]
sample == Uncompress[StringJoin["1:", 
    code[[IntegerDigits[goedelized, 64] + 1]], "=="]]

Run it and see what happens.