Given integers $a_0, a_1, ... a_n$, count how many $a_i$, $a_j$ whose binary representations logically ANDed equal zero

78 Views Asked by At

Given integers $a_0, a_1, ... a_n$, for each $a_i$, find the number of $a_j$ which share no common $1$s with $a_i$ when both $a_i$ and $a_j$ are written in a binary representation.

Another way of expressing the $a_i$ $a_j$ condition is through a logical AND statement: $$a_i \land a_j = 0$$

For any one particular $a_i$, a simple algorithm is to iterate through $a_j$ and checking if $a_i \land a_j = 0$. However, for a very large $n$, this algorithm will not perform well, since its complexity is $O(n^2)$. In my particular use-case, $n > 10^9$, and $a_i < 2^{64}-1$.


Please note, I do realize that this question may be more suitable to stack-overflow, but in my experience I have a better chance of receiving deep answers on math.stack, hopefully that justifies my choice of site.

Any intuition or insight would be much appreciated.

2

There are 2 best solutions below

8
On

EDIT: I currently do not think this is a right solution.

I think you can make it a bit more general. Let say, you have $n$ numbers, and the representation of any number requires at most $k$ bits (kind of $k = \log n$).

Take all the numbers and store them in the binary tree, where each node represents a suffix. I.e. the root represents an empty suffix. It has two children nodes: one representing a suffix $0$ and the other representing a suffix $1$. Those will have $00$ and $10$, and $01$ and $11$ children respectively. Etc.

While storing the numbers accumulate at each suffix the number of numbers that stored in the subtree rooted at the suffix node. Root will have total number of number, $0$ - the count of numbers that ends in $0$, etc.

Now, take $a_i$ - flip its bits and look up this number - node. Add to this the numbers that were stored directly above.

Example: Suppose, all the numbers $0 - 7$ are given: $000_2, 001_2, 010_2, 011_1, 100_2, 101_2, 110_2, 111_2$. You store numbers (fully expanded to maximum number of bits) in a tree: and count leafs in each subtree (during the store operation - you add a leaf and update the chain back to root). Therefore: the updated counts stored at nodes are:

  • root - $8$ nodes in a subtree,
  • $0$ and $1$ - $4$ nodes in each subtree,
  • $00$, $10$, $01$ and $11$ - $2$ nodes in each subtree,
  • $000$,$100$, $010$, $110$, $001$, $101$, $011$, $111$ - exactly $1$ node

Note, the order of nodes, I've used to store $0$ bit to the left, and $1$ to the right.

Suppose, you want to lookup $2_{10} = 10_2$. You traverse the tree down to the node with $1_2$ (the non-extended flipped value of $10$) to find $4$.

Storing and extracting will give you $O(n \log n)$, or $O(n \cdot k)$ complexity.

4
On

EDIT 2: This approach isn't going to work as is unfortunately, not even considering EDIT 1. The reason being, for every $1$ node, we'll need to split and keep track of every subtrie rooted at $0$ nodes at the same depth/level.

EDIT 1: I'm not sure how feasible it would be to create a trie encoding each of $a$ for some large $n$ such as $n=2^{64}$. I believe it would require $\sim2^{65} \sim 10^{19}$ nodes. I'll have to investigate some trie variations, in which case the below solution would probably need to be tweaked as well.

Let us construct a binary trie, $T$, in which a left arm represents a $0$ bit and a right arm represents a $1$ bit. We can encode $a$ in this manner. Also, let's store at each node the size of the subtrie $t$ rooted by the node.

Let us set a counter to #$a$. For some $a_i$, we begin a traversal of the trie. At depth $d$ of $T$ ($d_T$), we consider bit $d$ of the mask. If the bit is $1$, then we want to subtract from our counter the size of every $t$ rooted by a $1$ node at depth $d_T$. By doing so for every $d$, our counter should equal #$\{a_j \land a_i = 0\}$, hence answering the question.

However, for a large $n$, and particular $a_i$, I believe the number of traversals would be infeasible for me. For instance, imagine $T$ encodes integers $0$ to $2^{64}-1$, and $a_i = 2^{20}$. At depth $d_T=20$, I believe there are $\frac{2^{20}}{2}$ $1$ nodes (and the same amount of $0$ nodes), each of which needs to be traversed.

It may be the case that the solution performs better for $a_i$ with lower $1$ bits, since such $1$s factor out larger $t$; the best case scenario being if the $0^{th}$ bit in the mask is $1$, then on average half of the trie would be factored (out for a random selection of $a$).

What I suggest is precomputing for every $t$, and for every depth $d_t$, the sum of sizes of subtries $s_{d_t}$, or in words; subtries of $t$ rooted by $1$ nodes at depth $d_t$ within $t$. By doing so, I believe computing #$\{a_j \land a_i = 0\}$ requires traversing at most $64$ nodes and performing at each a subtraction from our counter.

I'm not sure how expensive this precomputation would be however. In essence, we need to traverse the whole tree I believe. In my particular use-case, I'm not sure if I can do this precomputation while constructing the trie, which would be optimal. The number of nodes for an encoding of $0$ to $2^{64}-1$ should be $$\sum_{i=1}^{64}2^i \sim 2^{65}$$