Probability of an ECM factor

597 Views Asked by At

Suppose I have a composite number $N$ divisible by some prime $p\le x.$ What is the probability that one iteration of ECM finds $p$, given parameters B1 and B2?


Usually people look for factors in bands, with the probability of failure set to $1/e$ for primes of a given size. I'm interested in giving probabilistic strength that no sufficiently-small prime was missed. So maybe a number was tested enough to ensure that with probability $1-1/e$ it has no prime factors smaller than $10^{40}$; I'd like to be able to say that if there was a prime factor smaller than $10^{30}$, it would have been found with probability 99.9% (and so, having not found one, either there is no such prime or we have witnessed a rare event).

1

There are 1 best solutions below

0
On BEST ANSWER

Since there no answers, I will attempt to give one. There may be some small mistakes, but the general theory should be right.

Short answer: You will always find primes $p$ if $B_2>{p+1+2\sqrt{p}}$, due to Hasse's theorem.
Otherwise, you will miss some. The probability of find $p$ in stage one is $$\Phi(p,B_1)/x\approx \rho(\ln p / \ln B_1),$$ where $\rho$ is the Dickman's function. The probability of finding $p$ in stage two is $$\Phi_1(p,B_1,B_2)/p,$$ where $\Phi_1(p,B_1,B_2)$ is the 1-semismooth function (More details below).
Note that this probability only works when considering a large range of primes.
Also, all considerations are over 1 curve.

A simple way to avoid these calculations is by simulations.
Simply pick $B_1$ and $B_2$, then pick $p$ in a range and check the probability.
The point is, the probability should be similar in a small range, say $p\in[2^i,2^{i+1})$.

If you are using GMP-ECM, then there are a lot of complications in computation:
(1) The order $m_p$ (defined below) is no longer a random integer, you need to account for increased probability as the curves are special.
(2) The sequence of computation matters: if you are computing $[e]\tilde P$, then the fact that you need to compute intermediate steps means you can check for additional points. (say $e_1\leq e_2\leq\dots \leq e$) This increases the probability and is very difficult to compute. However, I think the effect is minimal.
(3) $B_1$ bound setting might miss some small primes $p$ in rare cases. For example, if $|\tilde P|$ divisible by a high prime power (notation below). This effect should also be minimal.
(4) $B_2$ in GMP-ECM can cater for more than 1 large prime, say $i$ large primes, if I am not mistaken. This means you need to compute this function instead: $$\Phi_i(p,B_1,B_2)/p,$$

Long answer, describing the ideas involved:
Consider an Elliptic curve $$E:y^2=x^3+Ax+B$$ with a given point $P\in E(\mathbb Q)$.
Let $N$ be a composite and suppose $p|N$ for some prime $p$.
If $E$ has good reduction at $p$, then we can consider the reduction mod $p$, with point $\tilde P\in E(\mathbb F_p)$.
Hence we have order $|\tilde P|=m_p\in \mathbb N\Longleftrightarrow [m]\tilde P=\mathcal O$.

We can find the factor $p$ if we do the following:
(1) Reduce the curve mod $N$ to consider points on $E(\mathbb Z/N\mathbb Z)$, with a known point $\tilde P$.
(2) Compute $[e]\tilde P$ for any $e$ such that $m_p|e$.
At some point, the addition will fail and we get a non-trivial factor.
Therefore our goal is to choose $e$ such that it divides as many possible $m_p$ as possible.

To compute probability, we consider the variation of $m_p$ from $p=5$ all the way to $p\leq \sqrt N$.
(For $p=2,3$ we cannot use this type of curve. Not that we are interested anyway.)
Note that this is fixed for a given curve and point, we simply consider order of $\tilde P$ in $E(\mathbb F_p)$.
Thus for a given $e$, we consider if $m_p|e$ for $p=5,7,\dots, p_r\leq \sqrt N$.

The main question is: how does $m_p$ behave?
By Hasse's theorem, we know that $p+1-2\sqrt p\leq m_p\leq p+1+2\sqrt p$.
The distribution problem is Sato-Tate's Conjecture, which over all primes suggests that $m_p$ behaves like a random integer in $[p+1-2\sqrt p,p+1+2\sqrt p]$.

The idea of $B_1$ is to choose $k=2^{e_1}\cdot 3^{e_2}\cdot\dots\cdot p_r^{e_r}\leq B_1$ and compute $[k]\tilde P$.
Let $l$ be the largest prime factor of $m_p$.
Then we can find $p$ if $l\leq B_1$.
The probability of this occuring is when $m_p$ is $B_1$-smooth, i.e. having prime factors at most $B_1$.
Assuming that $m_p$ is a random integer, the probability that it is $B_1$-smooth is $$\Phi(p,B_1)/p\approx \rho(\ln p / \ln B_1),$$ where $\rho$ is the Dickman's function and $\Phi(X,B_1)$ is the expected number of $B_1$-smooth integers in $[1,x]$. Notably, if $B_1\to p$ then $\ln p/\ln B_1\to 1$ and $\rho(1)=1$.
This brings us to the first conclusion:

Let $p$ be a prime and $k$ an integer. Then the probability of finding $p$ by computing $[k]\tilde P$ in stage one is given by $\rho(\ln p/\ln B_1)$.

Something similar happens for stage two.
The $B_2$ bound (usually) indicates that you will compute $[e]\tilde P$ with $e=k\cdot q$ for primes $B_1<q\leq B_2$. This means you will find all primes $p$ where $m_p$ is 1-semismooth.
That is, it has at most 1 prime factor in $(B_1,B_2]$ and the rest $\leq B_1$.

Assuming $m_p$ is a random integer again, the probability of this occuring is given by: $$\Phi_1(p,B_1,B_2)/p,$$ where $\Phi_1(p,B_1,B_2)$ is the 1-semismooth function (See this paper on how to compute these functions. It has a Maple code if you have the software.)

Now for the final touch: we know how to compute probability for one prime.
Then the probability for a range of primes can be done by some integration or interpolation.
For example, say we want to consider primes in $[2^i,2^{i+1})$.
We can compute the middle probability: $$\lambda = \Phi_1((2^{i}+2^{i+1})/2, B_1,B_2)$$ and we know that there are $t\approx 2^{i+1}/\ln (2^{i+1})- 2^i/\ln(2^i)$ primes by the Prime Number Theorem. Hence we expect to find $t\lambda$ primes in that range. This can be useful for checking heuristics.