Can a "squashed" sequence of consecutive integers be unambiguously separated?

561 Views Asked by At

Cross-posted to MO.

I have thought of an interesting math problem while looking through the code golf website. It is concerning the challenge Decipher a squashed sequence.

Here is a quick rundown of what a squashed sequence is:

Take a sequence of ascending consecutive positive integers (of which there are at least two numbers), and concatenate them into one string. The concatenated result is a squashed sequence.

The challenge asks you to reverse the process. Essentially, given a squashed sequence, you have to return the list of integers that formed the squashed sequence.

For example, given the squashed sequence "$1011121314$", the output would be $[10,11,12,13,14]$.


My question is: Does there exist a squashed sequence which can be formed by more than one set of numbers (at least two numbers in each set)? If so, give an example of such a squashed sequence. Otherwise, why not?

2

There are 2 best solutions below

0
On

I wrote some c++ code that attempts to find an ambiguous string by brute force. The code works for multiple bases (such as base-10, which the original problem uses, or base-2 binary). Based on my runs, if ambiguous strings exist, they must rely on sequences that contain an integer greater than $10^7$. This is true for any base between 2 and 10, inclusive.

This provides some evidence that suggests that the answer to the question might be yes.

Below is the code with some example output.

#include <algorithm>
#include <cstdint>
#include <fstream>
#include <vector>

#ifndef BASE
#define BASE 10
#endif

#define VERBOSE false

int get_num_digits(uint64_t n) {
    int d = 0;
    while (n) {
        n /= BASE;
        ++d;
    }
    return d;
}

class AmbiguityFinder {
public:
    AmbiguityFinder(uint64_t n) {
        printf("Searching for ambiguities under n=%lu in base %d...\n",
                n, BASE);
        for (uint64_t k = 1; k <= n; ++k) {
            int start_index = digits_.size();
            start_indices_.push_back(start_index);
            uint64_t x = k;
            int n_digits_in_k = get_num_digits(k);
            while (x) {
                digits_.push_back(x % BASE);
                x /= BASE;
            }
            std::reverse(digits_.begin() + start_index, digits_.end());
            if (VERBOSE) printf("\nk=%lu %s\n", k, repr().c_str());
            if (ambiguity_check(n_digits_in_k - 1, n_digits_in_k)) return;
        }
        printf("No ambiguities found!\n");
        printf("If ambiguities are possible, the sequences must contain ");
        printf("integers >= %lu\n", n);
    }

    // Attempts to match the suffix of <digits_> with a sequence whose
    // last number has at most <n_digits_in_last_num> digits
    bool ambiguity_check(
            int n_digits_in_last_num,
            int n_digits_in_other_last_num) const
    {
        if (n_digits_in_last_num == 0) {
            return false;
        }
        int n_digits = digits_.size();
        int start_index = n_digits - n_digits_in_last_num;
        uint64_t last_num = extract_num(start_index, n_digits);
        if (!last_num) return false;
        if (VERBOSE) {
            printf("%s(n_digits_in_last_num=%d) last_num=%lu\n",
                    __func__, n_digits_in_last_num, last_num);
        }
        int end_index = n_digits - n_digits_in_last_num;
        if (ambiguity_check_helper(
                    last_num - 1, end_index, n_digits_in_other_last_num)) {
            return true;
        }
        return ambiguity_check(n_digits_in_last_num - 1,
                n_digits_in_other_last_num);
    }

    // Attempts to match a substring of <digits_> ending with
    // <digits_[end_index]> with a sequence whose last number is <expected_num>
    bool ambiguity_check_helper(
            uint64_t expected_num,
            int end_index,
            int n_digits_in_other_last_num) const
    {
        if (expected_num == 0) return false;
        int n_digits = get_num_digits(expected_num);
        int start_index = end_index - n_digits;
        if (VERBOSE) {
            printf("  %s(%lu, %d) n_digits=%d start_index=%d\n",
                    __func__, expected_num, end_index, n_digits, start_index);
        }
        if (start_index < 0) {
            if (VERBOSE) {
                printf("  %s(%lu, %d) false - start_index < 0\n",
                        __func__, expected_num, end_index);
            }
            return false;
        }
        uint64_t last_num = extract_num(start_index, end_index);
        if (expected_num != last_num) {
            if (VERBOSE) {
                printf("  %s(%lu, %d) false - expected_num=%lu last_num=%lu\n",
                        __func__, expected_num, end_index, expected_num,
                        last_num);
            }
            return false;
        }
        auto start_it = start_indices_.begin();
        auto end_it = start_indices_.end() - 1;
        if (std::binary_search(start_it, end_it, start_index)) {
            printf("Ambiguity found! The last number in the following ");
            printf("sequence can end with %d or %d digits\n",
                    n_digits_in_other_last_num, n_digits);
            printf("%s\n", repr(start_index, end_index).c_str());
            return true;
        }
        return ambiguity_check_helper(
                expected_num-1, start_index, n_digits_in_other_last_num);
    }

    // returns zero if leading digit is zero
    uint64_t extract_num(int start_index, int end_index) const {
        int i = start_index;
        uint64_t num = digits_[i++];
        if (!num) return 0;
        for (; i < end_index; ++i) {
            num *= BASE;
            num += digits_[i];
        }
        return num;
    }

    std::string repr() const { return repr(0, digits_.size()); }

    std::string repr(int start_index, int end_index) const {
        char buffer[end_index - start_index + 1];
        char* c = buffer;
        for (int i = start_index; i < end_index; ++i) {
            *(c++) = '0' + digits_[i];
        }
        *c = 0;
        return buffer;
    }

private:
    std::vector<uint8_t> digits_;
    std::vector<int> start_indices_;
};


int main(int argc, char *argv[])
{
    int n = atoi(argv[1]);
    AmbiguityFinder finder(n);
    return 0;
}

Compilation + output:

$ g++ test.cpp -std=c++17 -O3; ./a.out 10000000
Searching for ambiguities under n=10000000 in base 10...
No ambiguities found!
If ambiguities are possible, the sequences must contain integers >= 10000000

$ g++ test.cpp -std=c++17 -O3 -DBASE=2; ./a.out 10000000
Searching for ambiguities under n=10000000 in base 2...
No ambiguities found!
If ambiguities are possible, the sequences must contain integers >= 10000000
0
On

A condition on digit lengths

Let the initial integers of the two sequences be $N$ and $M$ where we shall suppose that $N<M$. We shall prove, by contradiction, that $M$ has at least twice as many digits as $N$. Let $d(X)$ denote the number of digits of $X$ and let $D(X)$ denote the sum of the digits of $X$ modulo $9$.

If $N=10^k-1$.

Then the $d(M)+1$th digit of $$M\oplus M+1\oplus ...=N\oplus N+1\oplus ...$$ is $0$, a contradiction since this is the leading digit of $M+1$.

Otherwise

Then we can write $N=S\oplus 10^k-1$ where $S\not\equiv 9\mod 10$ and where $10^k-1$ may be empty. Then, as in the previous case, the $d(M)$th digit of $$M\oplus M+1\oplus ...=N\oplus (S+1)10^k\oplus ...$$ must occur strictly within the digits of $N\oplus(S+1)$. Hence there are integers $A$ and $B$ such that $N=A\oplus B$ and $M=A\oplus B\oplus A$.

Omitting the cases where $A=10^k-1$ and/or $ B=10^l-1$ and/or $ B=10^l-2$, we have $$A\oplus B\oplus A\oplus A\oplus B\oplus (A+1)\oplus ...=A\oplus B\oplus A\oplus (B+1)\oplus A\oplus (B+2)\oplus ...$$ Considering the digit sum of the first $3d(A)+2d(B)$ digits gives $$3D(A)+2D(B)\equiv3D(A)+2D(B)+1\mod 9,$$ a contradiction.

The omitted cases are easily dealt with, completing the proof.