Let $n$ be a semi-prime.
Let $$S_k = \big\langle n+k, \left\lfloor \frac{n+k}{2} \right\rfloor, \cdots, \left\lfloor \frac{n+k}{2^i} \right\rfloor, \cdots, 0 \big\rangle$$
denote the finite sequence in the $k$-th iteration.
We start with $S_0$ and check each element $x \in S_k$ if $\gcd(x, n) \ne 1$ and stop when a non-trivial divisor is found.
The sequence $S_k$ contains around $\log_2(n+k)$ entries.
Question:
How many entries in total would have to be checked before we find a non-trivial divisor?
If $k$ is simply successive values $0, 1, 2, \cdots$, there is a possibility of $\left\lfloor \frac{n+k-1}{2} \right\rfloor = \left\lfloor \frac{n+k}{2} \right\rfloor$ and we would be checking the same numbers. Instead, we take $k$ from a deterministically generated sequence $k_j$. i.e.,
$$S_{k_j} = \big\langle n+k_j, \left\lfloor \frac{n+k_j}{2} \right\rfloor, \cdots, \left\lfloor \frac{n+k_j}{2^i} \right\rfloor, \cdots, 0 \big\rangle$$
$k_{j+1}$ is chosen such that $$\left\lfloor \frac{n+k_j}{2} \right\rfloor \ne \left\lfloor \frac{n+k_{j+1}}{2} \right\rfloor$$.
This still doesn't ensure that duplicate checks don't occur because the floor value calculated at some later step may be a previously checked value.
How should $k_{j+1}$ be chosen to minimize the duplicate checks? One way we could do this is by keeping track of the visited entries and stop the current iteration $k_j$ when we find an entry that has already been visited and advance the iteration to the next $k_{j+1}$.
The question now becomes, what is the worst case space requirement for storing the visited entries until a factor is found?
There is $p, 2p, 3p, \cdots, (q-1)p$ that are multiples of $p$ and similarly $q, 2q, 3q, \cdots, (p-1)q$ that are distinct multiples of $q$. So, a total of $p+q-2$ possibilities for $x$ where we will get a non-trivial $\gcd(x, n)$. So, the algorithm eventually terminates.
This scatter-plot shows $N$ vs. $n$, the number of iterations taken to find the factor and $visited$ which is the number of duplicate entries that we track for each $N$ before a factor is found. There is a close correlation between the two.
The strategy used for generating $k_{j+1}$: We use the recurrence
$$k_{j+1} = k_j + d^2 \text{ where $d = \lceil \log_2 N \rceil$}$$
This seems better than other options that I tried that include
- successive values of $k$ i.e., $k_{j+1} = k_{j} + 1$ and
- $k_{j+1} = k_j + d$
- $k_{j+1} = k_j + d^2 \text{ where $d = \lfloor \sqrt N \rfloor$}$
The test data is $625$ semiprimes that are products of the $25$ primes greater than $1000$ (it includes squares too that are technically not semiprimes).
Algorithmically, this does no better than trial division i.e., $O(\sqrt N)$.
The key to optimizing this search seems to be the sequence generation for $k_{j+1}$. Any ideas for this?
Optimizations:
- One optimization we could do is after completing an iteration $k_j$, instead of jumping to the next $k_{j+1}$ we can walk the unexplored sequence in reverse order. For eg: $0, 1, 2, \cdots, 2^k, \cdots, \delta, \delta+1, \cdots, n$ be the sequence and $\delta$ be the smallest unexplored value. We can then start with $\delta$ and do jumps to $2^i \delta$ until we reach past $n$ to $n+k_{j+1}$ and then we repeat the process. This way we can discard all values less than $\delta$ in each pass and never go below $\delta$.
