Pollard's Rho Algorithm and Floyd's Cycle-Finding

130 Views Asked by At

I'm doing a project on Pollard's Rho algorithm, and after lots of readings, I still am very confused about this one part.

I understand why there's a pseudorandom sequence, and I understand why it must cycle. I also understand why we are trying to find pairs of numbers to check their difference and whether they have a GCD with the N = pq in question.

Why are we actively looking for cycles though? Why does looking for cycles help here? I know they are inevitable in a sequence through polynomial modulo N, but why is it better to find the cycle quicker through Floyd's algorithm?

Also, if you're running Floyd's and $x = g(x)$ and $y = g(g(y))$, is the point where $y_i$ and $y_j$ equal each other (a.k.a. the hare finishing its cycle) where $| x_i - x_j |$ contains $p$? If so, why?

If someone could explain and also provide an example to any of these questions, that would be helpful. I know there is a similar question on this topic with an answer, but it's not exactly what I'm looking for, and it's not helping my understanding. It's possible I am misunderstanding what the articles are saying about Floyd's algorithm though.

1

There are 1 best solutions below

1
On

Let's say we're trying to factor $n = p \cdot q$

Why are we looking for cycles though?

Consider a deterministic pseudorandom sequence of integers $x_{k}$. Finding a cycle modulo $p$ within this sequence means that we'll have $i$ and $j$ such that $x_{i} \equiv x_{j} \pmod p$, which implies that $d = \vert x_{i} - x_{j} \vert$ is a multiple of $p$. Then we can take $\gcd(d, n)$ and find $p$, because both $n$ and $d$ are multiples of $p$. We don't precisely know the values $x_{k} \pmod p$ at any point, but we know that if we do find a cycle over $p$, the $\gcd$ operation will return $p$. This is how the algorithm finds cycles - it simply computes the $\gcd$ of $n$ and the difference of two values generated by the sequence - if a cycle is found, a non-trivial factor will be returned by $gcd$.

Let's walk through a simple example by generating a pseudorandom sequence, without applying Floyd's algorithm. Consider $n = 32881 = 131 \cdot 251$. Let's consider the pseudorandom sequence generated by:

  • $x_{1} = 5$
  • $g(x) = x^2 + 1$
i $x_{i} \mod 32881$ $x_{i} \mod 131$
$1$ $5$ $5$
$2$ $26$ $26$
$3$ $677$ $22$
$4$ $30877$ $92$
$5$ $4535$ $81$
$6$ $15601$ $12$
$7$ $6040$ $14$
$8$ $16572$ $66$
$9$ $9073$ $34$
$10$ $18187$ $109$
$11$ $16991$ $92$

So we have $x_{11} \equiv x_{4} \equiv 92 \pmod p$. But since we don't know the value of $p$, we couldn't have calculated $92$ explicitly, nor matched the values $x_{11}$ and $x_{4}$. However, we do know that if their values modulo $p$ match, the $\gcd$ operation will return a non-trivial factor of $n$. Indeed, calculating $\gcd(30877 - 16991, 32881) = 131$ yields $p$.

why is it better to find the cycle quicker through Floyd's algorithm?

When used in Pollard's Rho, Floyd's algorithm is better than linear cycle finding and some other algorithms for two reasons:

  • It saves space, as it only needs to keep two pointer values at any time. For example, linear cycle finding requires you to store every value that you've already encountered, which raises space requirements of the algorithm. This is not specific to Pollard's Rho and is the main advantage of the algorithm.
  • Each step only requires the computation of a single $\gcd$, in order to determine whether the current two values of the sequence are equal modulo $p$. If you wanted to use, e.g. linear cycle finding, you'd most likely have to compute $\gcd(x_{i} - x_{j}, n)$ for each $j < i$ in $i$-th step of the algorithm, which would raise your complexity to quadratic in terms of the length of the cycle. There may be special data structures that would speed up the computations, but I'm not aware of them.

Also, if you're running Floyd's and x=g(x) and y=g(g(y)), is the point where yi and yj equal each other (a.k.a. the hare finishing its cycle) where |xi−xj| contains p? If so, why?

Unless I'm misunderstanding what you asked, this is not necessarily true. You can experiment and find a counterexample relatively easily.