Factoring semiprimes using an iterative binary-search

57 Views Asked by At

Let $n$ be a semi-prime.

Let $$S_k = \big\langle n+k, \left\lfloor \frac{n+k}{2} \right\rfloor, \cdots, \left\lfloor \frac{n+k}{2^i} \right\rfloor, \cdots, 0 \big\rangle$$

denote the finite sequence in the $k$-th iteration.

We start with $S_0$ and check each element $x \in S_k$ if $\gcd(x, n) \ne 1$ and stop when a non-trivial divisor is found.

The sequence $S_k$ contains around $\log_2(n+k)$ entries.

Question:

  1. How many entries in total would have to be checked before we find a non-trivial divisor?

  2. If $k$ is simply successive values $0, 1, 2, \cdots$, there is a possibility of $\left\lfloor \frac{n+k-1}{2} \right\rfloor = \left\lfloor \frac{n+k}{2} \right\rfloor$ and we would be checking the same numbers. Instead, we take $k$ from a deterministically generated sequence $k_j$. i.e.,

$$S_{k_j} = \big\langle n+k_j, \left\lfloor \frac{n+k_j}{2} \right\rfloor, \cdots, \left\lfloor \frac{n+k_j}{2^i} \right\rfloor, \cdots, 0 \big\rangle$$

$k_{j+1}$ is chosen such that $$\left\lfloor \frac{n+k_j}{2} \right\rfloor \ne \left\lfloor \frac{n+k_{j+1}}{2} \right\rfloor$$.

This still doesn't ensure that duplicate checks don't occur because the floor value calculated at some later step may be a previously checked value.

How should $k_{j+1}$ be chosen to minimize the duplicate checks? One way we could do this is by keeping track of the visited entries and stop the current iteration $k_j$ when we find an entry that has already been visited and advance the iteration to the next $k_{j+1}$.

The question now becomes, what is the worst case space requirement for storing the visited entries until a factor is found?


There is $p, 2p, 3p, \cdots, (q-1)p$ that are multiples of $p$ and similarly $q, 2q, 3q, \cdots, (p-1)q$ that are distinct multiples of $q$. So, a total of $p+q-2$ possibilities for $x$ where we will get a non-trivial $\gcd(x, n)$. So, the algorithm eventually terminates.

Scatter plot of N vs. n, visited

This scatter-plot shows $N$ vs. $n$, the number of iterations taken to find the factor and $visited$ which is the number of duplicate entries that we track for each $N$ before a factor is found. There is a close correlation between the two.

The strategy used for generating $k_{j+1}$: We use the recurrence

$$k_{j+1} = k_j + d^2 \text{ where $d = \lceil \log_2 N \rceil$}$$

This seems better than other options that I tried that include

  • successive values of $k$ i.e., $k_{j+1} = k_{j} + 1$ and
  • $k_{j+1} = k_j + d$
  • $k_{j+1} = k_j + d^2 \text{ where $d = \lfloor \sqrt N \rfloor$}$

The test data is $625$ semiprimes that are products of the $25$ primes greater than $1000$ (it includes squares too that are technically not semiprimes).

Algorithmically, this does no better than trial division i.e., $O(\sqrt N)$.

The key to optimizing this search seems to be the sequence generation for $k_{j+1}$. Any ideas for this?

Optimizations:

  • One optimization we could do is after completing an iteration $k_j$, instead of jumping to the next $k_{j+1}$ we can walk the unexplored sequence in reverse order. For eg: $0, 1, 2, \cdots, 2^k, \cdots, \delta, \delta+1, \cdots, n$ be the sequence and $\delta$ be the smallest unexplored value. We can then start with $\delta$ and do jumps to $2^i \delta$ until we reach past $n$ to $n+k_{j+1}$ and then we repeat the process. This way we can discard all values less than $\delta$ in each pass and never go below $\delta$.