asymptotics of the square indicator sum for subsequences of digits

Question

asymptotics of the square indicator sum for subsequences of digits

400 Views Asked by Bumbble Comm At 04 May 2026 - 8:25

Introduce the function $g(n)$ defined for positive integer arguments $n$ as the count of squares of positive integers among the numbers that can be formed by taking some subsequence of the digits of the binary representation of $n$ and interpreting this subsequence as a binary number. All subsequences contribute.

For example, $5 = (101)_2$ and the subsequences of digits are $$ (1)_2 = 1, (0)_2 = 0, (1)_2 = 1, (10)_2 = 2, (11)_2 = 3, (01)_1 = 1, (101)_2 = 5$$ and this includes the square "one" three times, so $g(5) = 3.$

When $n = 9 = (1001)_2$, we obtain the following subsequences: $$ (1)_2 = 1, (0)_2 = 0, (0)_2 = 0, (1)_2 = 1, (10)_2 = 2, (10)_2 = 2, (11)_2 = 3, (00)_2 = 0, (01)_2 = 1, (01)_2 = 1$$ and $$ (100)_2 = 4, (101)_2 = 5, (101)_2 = 5, (001)_2 = 1, (1001)_2 = 9$$ and this includes seven squares, so $g(9) = 7.$

The behavior of $g(n)$ is highly erratic, ranging from linear for $n=2^k$ to logarithmic for $n=2^k-1,$ which motivates us to ask the following question. What is the first term of the asymptotic expansion of the average order of $g(n)$? I.e. find the asymptotics of $$ \frac{1}{n} \sum_{k=1}^n g(k).$$

It might be useful to establish and prove closed form expressions for special values of $g(n)$, e.g. we have $g(2^k) = 2^{k-1}$ and $g(2^k-1) = k$ as pointed out earlier (prove these). Here are the first few terms of $g(n):$ $$1, 1, 2, 2, 3, 2, 3, 4, 7, 4, 5, 4, 4, 3, 4, 8, 15, 9, 12, 8, 9, 6, 7, 8, 11$$

This problem is not suited to numeric experimentation because $g(n)$ fluctuates so rapidly. For those who want to try anyway here is some C code. (I was going to memoize this the same way I memoized the Maple routine, but never got around to it. This is why the code looks as it does.) C Code for the square indicator sum over binary digit subsequences


#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <readline/readline.h>

unsigned long long isqrt(unsigned long long n)
{
  if(!n) return 0;

  unsigned long long a = 1, b = n, mid;

  do {
    mid = (a+b)/2;
    if(mid*mid>n){
      b = mid;
    }
    else{
      a = mid;
    }
  } while(b-a>1);

  return a;
}

unsigned long long g(unsigned long long n)
{
  int bits[256], len = 0;
  while(n>0){
    bits[len++] = n%2;
    n >>= 1;
  }

  unsigned long long res = 0, ind;
  for(ind=0; ind<(1<<len); ind++){
    unsigned long long varind = ind;
    int subseq[256], srcpos=0, pos=0;
    while(varind>0){
      if(varind%2){
        subseq[pos++] = bits[srcpos];
      }
      srcpos++;
      varind >>= 1;
    }

    unsigned long long val = 0;
    int seqpos = 0;
    while(seqpos<pos){
      val += (1<<seqpos)*subseq[seqpos];
      seqpos++;
    }

    unsigned long long cs = isqrt(val);
    if(val>0 && cs*cs == val){
      res++;
    }
  }

  return res;
}

unsigned long long gavg(unsigned long long max)
{
  unsigned long long res = 0, n;

  for(n=1; n<=max; n++){
    res += g(n);
  }

  return res;
}

int main(int argc, char *argv)
{
  unsigned long long max;
  char *line;

  while((line = readline("> ")) != NULL){
    if(sscanf(line, "%llu", &max)==1){
      unsigned long gsum = gavg(max);

      long double asympt = (long double)gsum;
      asympt = asympt/(long double)max;

      printf("%llu %lle\n", gsum, asympt);
    }
    free(line);
  }

  exit(0);
}

This is the Maple code, which is noticeably slower. Maple Code for the square indicator sum


g :=
proc(n)
        option remember;
        local dlist, ind, flind, sseq, len, sel, val, r, res;

        dlist := convert(n, base, 2);
        len := nops(dlist);

        res := 0;
        for ind from 0 to 2^len-1 do
            sel := convert(ind, base, 2);

            sseq := [];
            for flind to nops(sel) do
                if sel[flind] = 1 then
                   sseq := [op(sseq), dlist[flind]];
                fi;
            od;

            val := add(sseq[k]*2^(k-1), k=1..nops(sseq));

            r := floor(sqrt(val));
            if val>0 and r*r = val then
               res := res+1;
            fi;
        od;

        res;
end;

avg := n -> add(g(k), k=1..n);

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

This is an answer to the (corrected) bounty question, but I would still like to know if there is an asymptotic formula for the average value of $g(k)$ or whether it fluctuates between multiples of $n^b$.

Let $n = 2^m$ be a power of two, and let $F(n)$ be the number of squares found as subsequences of all binary strings of length $m$, and let $G(n)$ be the exact value of $\sum_{k=0}^{n-1} g(k)$. (It is convenient to start counting at $0$ rather than $1$ but it does not affect the asymptotics.)

It is certainly not true that $F(n) = G(n)$, but it is a useful approximation in two ways. On one hand, $F(n) \ge G(n)$, because every binary string of length $m$ corresponds to some integer in $[0,2^m-1]$ but it has strictly more subsequences because of zero-padding. On the other hand, $F(n) \le \sum_{k=n}^{2n-1} g(k) = G(2n) - G(n)$, because adding a leading $1$ to each binary string of length $m$ yields a legitimate binary integer in $[2^m,2^{m+1}-1]$.

The benefit of this simplified model is that it is very easy to estimate $F(n)$. For each length $1 \le \ell \le m$, there are $m\choose\ell$ subsequences of length $\ell$ within any $m$-bit string. For each choice there will be about $\sqrt{2^\ell}$ (more precisely, $\lfloor \sqrt{2^\ell-1}\rfloor+1$) ways to choose a perfect square to fill in that subsequence. Then there are $2^{m-\ell}$ ways to fill in the remaining bits. Therefore

$$F(n) \approx \sum_{\ell=1}^m {m\choose\ell} (\sqrt{2})^{\ell} 2^{m-\ell} = (\sqrt{2}+2)^m - 2^m,$$ by the binomial formula. Thus $F(n) \sim n^c$ where $c = \log_2(2+\sqrt{2})$, and $\tfrac1n F(n) \sim n^b$ where $b = c-1 = 0.77155\ldots$.

Recall that $G(n) \le F(n) \le G(2n)-G(n)$. Since $F(n) \sim n^c$, the left inequality easily gives $G(n) = O(n^c)$, while the right inequality yields $G(n) \ge G(n)-G(n/2) \ge F(n/2) = \Omega(n^c)$. Therefore $G(n) = \Theta(n^c)$ when $n$ is a power of $2$, and since it is monotonically increasing, this growth rate can be seen to extend to all $n$. The desired function $q(n)$ is thus $n^b$.

asymptotics of the square indicator sum for subsequences of digits

There are 1 best solutions below

Related Questions in NUMBER-THEORY

Related Questions in RECREATIONAL-MATHEMATICS

Trending Questions

Popular # Hahtags

Popular Questions