Questions from Dixon Factorization Paper

369 Views Asked by At

I read the first page of Asymptotically Fast Factorization of Integers, and have a few questions.

Quotes from the paper are formatted as blockquotes (>).

Following Legendre, we know that there exist integers x and y which are relatively prime to n and such that x^2 == y^2 (mod n), but x != y or -y.

From what I read about coprime integers, two numbers, x and y, are relatively prime if they only share 1 as an even divisor.

Example: 4 and 9 are co-prime since:

  • 4's factors: {1, 2, 4}
  • 9's factors: {1, 3, 9}

For such integers GCD(n, x+y) is a proper factor of n.

Our search for the integers x and y is carried out in two stages. First we look for squares z^2 which are congruent (mod n) to integers whose prime factors all lie in some set P.

Is this set, P, equivalent to a bound? For example, if the bound is 7, then the set, P, consists of all prime numbers up to and including 7?

So, a bound of 7 yields: P = {2,3,5,7}.

Lastly, if N = 5995, what is the work to carry out the first stage?

Given the set of [1..5995], we'd only include numbers, i.e. we filter on numbers whose prime factors consist solely of set P, {2,3,5,7}?

Once we've filtered those numbers, we then check if, for each number, its squared value (mod 5995) == 0?

2

There are 2 best solutions below

1
On

First, one of the more useful tools for dealing with elementary arithmetic is the Chinese remainder theorem. It states that, if $p, q$ are coprime, then any set of two equations $x \equiv a \pmod{p}, x \equiv b \pmod{q}$ has a (unique) solution modulo $pq$.

In your case, this means that, if $n$ is not a prime power, it has a factorization $n = pq$ with $p$, $q$ coprime and both $\neq 1$; then pick any number $y$, the set of two equations (in $x$) $$ x \equiv y \pmod{p}, \quad x \equiv -y \pmod{q}$$ has a unique solution (by the existence property of the Chinese remainder theorem), which I shall write $x \pmod{pq}$. Then we may check that $x^2, y^2$ satisfy the relation $$ x^2 \equiv y^2 \pmod{p}, \quad x^2 \equiv y^2 \pmod{q},$$ which, by the unicity property of the Chinese remainder theorem, implies that $x^2 \equiv y^2 \pmod{pq}$. Since however $x \not\equiv y \pmod{q}$, we have $x \not \equiv y \pmod{pq}$; likewise, since $x \not \equiv -y \pmod{p}$, we have $x \not \equiv y \pmod{pq}$. This proves your first quote.

Now assume that $n = pq$ with $p, q$ unknown, and let $x, y$ be such a pair of integers (known). The crucial fact is that the relations $x^2 \equiv y^2, x \not\equiv y, x \not \equiv -y$ do not depend on the unknown values $p, q$. You then have:
- $n= pq$ divides $x^2 - y^2$;
- $n= pq$ does not divide $x - y$;
- $n= pq$ does not divide $x + y$; which means that in the (unknown) pair $p, q$, one of them (say $p$) divides (only) $x-y$, and the other divides (only) $x + y$, and both of them divide $n$ of course.

We may therefore recover $p$ from the values $n$ and $x-y$: it is the GCD of $n$ and $x-y$. (This is computable in polynomial time using Euclid's algorithm).

For the equivalence between the set of prime factors and the bound: this is not a full equivalence. Since you need a finite set of auxiliary primes, such a set is always bounded. But you could want, for some reason, to use only some of those primes. In practice this is not the case, when factoring using the quadratic sieve there is no reason to “throw away” elements of the smoothness basis. On the contrary, you need as many primes as possible (to improve the probability that $z$ is smooth relative to your basis), as small as possible (to improve the speed of your computations).

For your 5995 example: it is a bit of a strange example since this number is obviously already dividble by 5, which is an element of any reasonable smoothness basis. Instead I will work out the case of $18419$. By random generation I find the following relations: $$\begin{split} 148^2 &\equiv 5 \times 17 \times 41\\ 139^2 &\equiv 2 \times 11 \times 41\\ 685^2 &\equiv 2 \times 5^4 \times 7\\ 887^2 &\equiv -1 \times 2 \times 41\\ 235^2 &\equiv -1 \times 2^5\\ 622^2 &\equiv 5 \times 17\\ 955^2 &\equiv -1 \times 3 \times 5^2 \times 7 \times 17\\ 56^2 & \equiv 2^6 \times 7^2\\ 525^2 & \equiv -1 \times 2^2 \times 3 \times 5 \times 11\\ 869^2 & \equiv -1 \times 2 \times 3^2 \end{split}$$ from which we get the relation: $$ (235 \times 869)^2 \equiv (-1)^2 \times 2^6 \times 3^2, \quad\text{or}\quad 1606^2 \equiv 24^2 \pmod{18419}.$$ This means that $18419$ divides $(1606-24) \times (1606+24)$. We check by hand that it divides neither factor, and we may compute $$ \gcd (18419, 1606-24) = 113,$$ which is a proper divisor of $18419$.

0
On

As you know, Fermat's insight was to represent $N$ as a difference of two squares, $$N = A^2 - B^2$$We can factor this expression to $N = (A + B)(A - B)$, which gives us a factorization of $N$, namely $A+B$ and $A-B$.

Fermat's method näively searches for this difference of squares by letting $f(x) = x^2 - N$ and trying values of $x$ until we find that $f(x) = B^2$ is a perfect square. Then we let $A = x$ and compute the factors $A \pm B$. This is terribly inefficient: slower than trial division in the worst case!

Dixon's insight was that instead of searching for a pair of squares, we can construct one directly from things that are easier to find. In particular, we let $f(x) = x^2 - N$ as before, but instead of looking for $f(x) =$ "perfect square," we just look for $f(x) =$ "a number with small factors." This begs a couple of questions:

  1. What is even remotely helpful about finding such a thing?
  2. How small is "small"?

To answer these questions, suppose we have a list of integers: $$\begin{align}J & = 3 \cdot 3 \cdot 5 \cdot 7 \\ K & = 2 \cdot 3 \cdot 11 \\ L & = 5 \cdot 7 \cdot 7 \cdot 7 \\ M & = 2 \cdot 2 \cdot 5 \cdot 5 \cdot 5 \cdot 5\end{align}$$

Q: Is any of these a perfect square?
A: Well, $J,K$ and $L$ aren't.
Q: How can you tell?
A: That's easy! Each of their prime factors appears an odd number of times.
Q: What about $M$?
A: Yes, $M$ is a perfect square, because each of its factors appears an even number of times. This means we can remove half of each of them to get an integer $\sqrt{M}$.

A more interesting question: can any combination of the first three numbers be multiplied together to make a perfect square? Indeed! See: $$J \cdot L = (3 \cdot 3 \cdot 5 \cdot 7)(5 \cdot 7 \cdot 7 \cdot 7) = (3 \cdot 3)(5 \cdot 5)(7 \cdot 7 \cdot 7 \cdot 7)$$

A more practical question: In a given set of integers, what's the probability that ANY combination of them can be multiplied together to get a square? A precise answer isn't needed at this point. It's enough to say, "Well, if the numbers tend to share a lot of the same factors, then there's a very, very good chance!"

Now if we choose a bunch of numbers that have only "small factors" (i.e. less than some bound $B$), these numbers will tend to share a lot of the same factors, which makes it easier to find a combination that multiplies to a perfect square. In fact, if we represent each number as an "exponent vector" modulo $2$, we can stuff them all in a matrix and use row-reduction to find a combination with all the factors having even exponents (as per the Q & A session above). But I'm jumping too far ahead.

Hopefully it's clear now why Dixon's goal is to find values of $f(x)$ that have "small factors." We replace the term "small factors" with "$B$-smooth" (i.e. having only factors less than $B$). Then each time we find a pair $x,f(x)$ such that $f(x)$ is $B$-smooth, we call this pair a "relation."

If $B$ is small, then there's only a small set of possible factors for each $B$-smooth number, and thus such numbers will tend to share a lot of the same factors, which is great! However, when $B$ is small, it's also harder to find $B$-smooth numbers, because they're more rare. If you're thinking "isn't there a happy middle?" you're asking exactly the right question, and the answer is yes. There's a heuristic formula for it introduced in Carl Pomerance's paper Smooth numbers and the quadratic sieve. I won't try to explain it here because it's long and a little over my head.

  • Key: It's a lot easier to find $B$-smooth numbers than to find perfect squares, because they're a lot more common! This is the key insight of Dixon's method building on Fermat's foundation.

In your comments you asked for an example of factoring using Dixon's method, so I'll try to give one here. I don't have space to explain exponent vectors or the linear algebra way to find a square combination. For these things and other bells & whistles pertaining to Dixon and the Quadratic Sieve, I'd refer you to Carl Pomerance's gentle article A Tale of Two Sieves.

For our example, let: $$\begin{align} N & = 319 \\ B & = 13 \\ f(x) & = x^2 - N \end{align}$$

I chose a hefty $B$ for convenience, to make it easier to find relations. In practice, $B$ is very small compared to $N$. Now we'll just do a linear search for outputs of $f(x)$ that are $B$-smooth. Note that if $x ≤ 17$, then $f(x)$ is negative. In practice it's common to consider $-1$ as a valid "factor" to include when building a square combination. But for simplicity we'll just start at $x = 18$:

$$\begin{array}{c|l} x & f(x) \\ \hline 18 & 5 \\ 19 & 2 \cdot 2 \cdot 3 \cdot 7 \\ 21 & 2 \cdot 61 \\ 22 & 3 \cdot 5 \cdot 11 \\ 23 & 2 \cdot 3 \cdot 5 \cdot 7 \\ 24 & 257 \\ 25 & 2 \cdot 3 \cdot 3 \cdot 17 \\ 26 & 3 \cdot 7 \cdot 17 \\ \dots & \dots \\ 33 & 2 \cdot 5 \cdot 7 \cdot 11 \\ \dots & \dots \end{array}$$

Note that I actually skipped the entry $f(20) = 9^2$, because that gives us a difference of squares straightaway, and we want to practice finding combinations! At $x=33$ we can stop, because we have a combination of relations that multiply to get a congruence of squares. In case you're feeling lazy, it's these: $$\begin{align} f(18) & = 5 \\ f(22) & = 3 \cdot 5 \cdot 11 \\ f(23) & = 2 \cdot 3 \cdot 5 \cdot 7 \\ f(33) & = 2 \cdot 5 \cdot 7 \cdot 11 \end{align}$$

Multiply corresponding sides of these relations together to get: $$\begin{align} (18 \cdot 22 \cdot 23 \cdot 33)^2 - N & \equiv (5)(3 \cdot 5 \cdot 11)(2 \cdot 3 \cdot 5 \cdot 7)(2 \cdot 5 \cdot 7 \cdot 11) \\ 300564^2 - N & \equiv (2 \cdot 2)(3 \cdot 3)(5 \cdot 5)(7 \cdot 7)(11 \cdot 11) \\ 300564^2 - N & \equiv (2 \cdot 3 \cdot 5 \cdot 7 \cdot 11)^2 \\ 300564^2 - N & \equiv 2310^2 \end{align}$$

We have to use the congruence symbol "$\equiv$" $\pmod{N}$ instead of an equals sign, because. [sic]

Now with good probability we can discover a factor by computing: $$\gcd(300564+2310, 319) = 11\\ \gcd(300564-2310, 319) = 319$$

And we found a factor, $11$.