Vector specifying the number of $1$’s in the binary representations of $1,...,k$

66 Views Asked by At

For a fixed integer $k$, I am interested in a closed-form expression for the vector $\textbf{a}$, where $\textbf{a}[i]$ is the number of integers in $\{1,...,k\}$ that have $i$ $1$’s in their binary representations, which is to say Hamming weight $i$.

Consider the example for $k=5$, so the binary representations are

$1=001$
$2=010$
$3=011$
$4=100$
$5=101$.

I now want a histogram of the frequency of $1$’s in those binary representations. My vector’s first component, which represents Hamming weight 1, is $|\{001,010,100\}|=3$, and its second component, representing Hamming weight 2, is $|\{011,101\}|=2$. So we get the vector ${\textbf a}=(3,2,0,0,0,\ldots)$.

1

There are 1 best solutions below

0
On

A closed form is not easy to find, but it is possible to write a recursive form (it's basically Dinamic Programming). Let's denote with $a_k[i]$ your quantity, since it also depends on $k$. We proceed by steps:

  1. Get the length of the binary representation. The number of bits needed to represent $k$ is $m:=\lfloor \log_2 k \rfloor +1 $.
  2. Compute ${a}_{2^{m-1}}[i]$, i.e. the number of integers you can write with $i$ ones with $m-1$ bits. In fact, the $m-1$ right-most bits in the representation can assume every possible value if the left-most bit is $0$. This number is $a_{2^{m-1}}[i]= {m-1\choose i}$, because you can choose $i$ spots over a total of $m-1$ where you put the $i$ ones.
  3. Count the remaining configurations: let $b_m\dots b_1$ be the binary representation of $k$. For sure $b_m=1$. Let $J:= \max \bigl\{ j \in \{1,\dots,m-1\} \ : \ b_j=1 \bigr\}$ and let $r$ be the decimal number which binary representation is $b_J\dots b_1$. Then, we have that: $$a_k[i] = a_{2^{m-1}}[i]+a_r[i-1] \ ,$$ where the index $i-1$ derives from the fact that we are already counting the first bit $b_m$.

We can add the base cases, which are two. When $r=2^n$ or $r=2^n-1$ we can avoid doing $m$ recursion steps because we can directly use the formula in step 2. (and add just $1$ to $a[i]$ in the first case).

If we want to do a rough worst-case analysis we can tell that at most we have $O(\log k)$ recursion steps. We can of course do a more precise analysis with average costs and improve the algorithm's details but I don't know if it is what the question asks.