asymptotics of the square indicator sum for subsequences of digits

400 Views Asked by At

Introduce the function $g(n)$ defined for positive integer arguments $n$ as the count of squares of positive integers among the numbers that can be formed by taking some subsequence of the digits of the binary representation of $n$ and interpreting this subsequence as a binary number. All subsequences contribute.

For example, $5 = (101)_2$ and the subsequences of digits are $$ (1)_2 = 1, (0)_2 = 0, (1)_2 = 1, (10)_2 = 2, (11)_2 = 3, (01)_1 = 1, (101)_2 = 5$$ and this includes the square "one" three times, so $g(5) = 3.$

When $n = 9 = (1001)_2$, we obtain the following subsequences: $$ (1)_2 = 1, (0)_2 = 0, (0)_2 = 0, (1)_2 = 1, (10)_2 = 2, (10)_2 = 2, (11)_2 = 3, (00)_2 = 0, (01)_2 = 1, (01)_2 = 1$$ and $$ (100)_2 = 4, (101)_2 = 5, (101)_2 = 5, (001)_2 = 1, (1001)_2 = 9$$ and this includes seven squares, so $g(9) = 7.$

The behavior of $g(n)$ is highly erratic, ranging from linear for $n=2^k$ to logarithmic for $n=2^k-1,$ which motivates us to ask the following question. What is the first term of the asymptotic expansion of the average order of $g(n)$? I.e. find the asymptotics of $$ \frac{1}{n} \sum_{k=1}^n g(k).$$

It might be useful to establish and prove closed form expressions for special values of $g(n)$, e.g. we have $g(2^k) = 2^{k-1}$ and $g(2^k-1) = k$ as pointed out earlier (prove these). Here are the first few terms of $g(n):$ $$1, 1, 2, 2, 3, 2, 3, 4, 7, 4, 5, 4, 4, 3, 4, 8, 15, 9, 12, 8, 9, 6, 7, 8, 11$$

This problem is not suited to numeric experimentation because $g(n)$ fluctuates so rapidly. For those who want to try anyway here is some C code. (I was going to memoize this the same way I memoized the Maple routine, but never got around to it. This is why the code looks as it does.) C Code for the square indicator sum over binary digit subsequences


#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <readline/readline.h>

unsigned long long isqrt(unsigned long long n)
{
  if(!n) return 0;

  unsigned long long a = 1, b = n, mid;

  do {
    mid = (a+b)/2;
    if(mid*mid>n){
      b = mid;
    }
    else{
      a = mid;
    }
  } while(b-a>1);

  return a;
}

unsigned long long g(unsigned long long n)
{
  int bits[256], len = 0;
  while(n>0){
    bits[len++] = n%2;
    n >>= 1;
  }

  unsigned long long res = 0, ind;
  for(ind=0; ind<(1<<len); ind++){
    unsigned long long varind = ind;
    int subseq[256], srcpos=0, pos=0;
    while(varind>0){
      if(varind%2){
        subseq[pos++] = bits[srcpos];
      }
      srcpos++;
      varind >>= 1;
    }

    unsigned long long val = 0;
    int seqpos = 0;
    while(seqpos<pos){
      val += (1<<seqpos)*subseq[seqpos];
      seqpos++;
    }

    unsigned long long cs = isqrt(val);
    if(val>0 && cs*cs == val){
      res++;
    }
  }

  return res;
}

unsigned long long gavg(unsigned long long max)
{
  unsigned long long res = 0, n;

  for(n=1; n<=max; n++){
    res += g(n);
  }

  return res;
}

int main(int argc, char *argv)
{
  unsigned long long max;
  char *line;

  while((line = readline("> ")) != NULL){
    if(sscanf(line, "%llu", &max)==1){
      unsigned long gsum = gavg(max);

      long double asympt = (long double)gsum;
      asympt = asympt/(long double)max;

      printf("%llu %lle\n", gsum, asympt);
    }
    free(line);
  }

  exit(0);
}

This is the Maple code, which is noticeably slower.Maple Code for the square indicator sum


g :=
proc(n)
        option remember;
        local dlist, ind, flind, sseq, len, sel, val, r, res;

        dlist := convert(n, base, 2);
        len := nops(dlist);

        res := 0;
        for ind from 0 to 2^len-1 do
            sel := convert(ind, base, 2);

            sseq := [];
            for flind to nops(sel) do
                if sel[flind] = 1 then
                   sseq := [op(sseq), dlist[flind]];
                fi;
            od;

            val := add(sseq[k]*2^(k-1), k=1..nops(sseq));

            r := floor(sqrt(val));
            if val>0 and r*r = val then
               res := res+1;
            fi;
        od;

        res;
end;

avg := n -> add(g(k), k=1..n);
1

There are 1 best solutions below

7
On BEST ANSWER

This is an answer to the (corrected) bounty question, but I would still like to know if there is an asymptotic formula for the average value of $g(k)$ or whether it fluctuates between multiples of $n^b$.

Let $n = 2^m$ be a power of two, and let $F(n)$ be the number of squares found as subsequences of all binary strings of length $m$, and let $G(n)$ be the exact value of $\sum_{k=0}^{n-1} g(k)$. (It is convenient to start counting at $0$ rather than $1$ but it does not affect the asymptotics.)

It is certainly not true that $F(n) = G(n)$, but it is a useful approximation in two ways. On one hand, $F(n) \ge G(n)$, because every binary string of length $m$ corresponds to some integer in $[0,2^m-1]$ but it has strictly more subsequences because of zero-padding. On the other hand, $F(n) \le \sum_{k=n}^{2n-1} g(k) = G(2n) - G(n)$, because adding a leading $1$ to each binary string of length $m$ yields a legitimate binary integer in $[2^m,2^{m+1}-1]$.

The benefit of this simplified model is that it is very easy to estimate $F(n)$. For each length $1 \le \ell \le m$, there are $m\choose\ell$ subsequences of length $\ell$ within any $m$-bit string. For each choice there will be about $\sqrt{2^\ell}$ (more precisely, $\lfloor \sqrt{2^\ell-1}\rfloor+1$) ways to choose a perfect square to fill in that subsequence. Then there are $2^{m-\ell}$ ways to fill in the remaining bits. Therefore

$$F(n) \approx \sum_{\ell=1}^m {m\choose\ell} (\sqrt{2})^{\ell} 2^{m-\ell} = (\sqrt{2}+2)^m - 2^m,$$ by the binomial formula. Thus $F(n) \sim n^c$ where $c = \log_2(2+\sqrt{2})$, and $\tfrac1n F(n) \sim n^b$ where $b = c-1 = 0.77155\ldots$.

Recall that $G(n) \le F(n) \le G(2n)-G(n)$. Since $F(n) \sim n^c$, the left inequality easily gives $G(n) = O(n^c)$, while the right inequality yields $G(n) \ge G(n)-G(n/2) \ge F(n/2) = \Omega(n^c)$. Therefore $G(n) = \Theta(n^c)$ when $n$ is a power of $2$, and since it is monotonically increasing, this growth rate can be seen to extend to all $n$. The desired function $q(n)$ is thus $n^b$.