I would like to learn how to use wheel factorization but am having trouble understanding it. I tried reading the wikipedia article but found it confusing (even the talk page says it's a mess). What exactly is it and how is it used? To my understanding it eliminates some (but not all) composite numbers in a list up to a certain number. So in this sense it's a technique that can be used to speed up existing factorization algorithms? It seems to almost be the same as the sieve of Eratosthenes except it starts with a small list of known prime numbers?
If someone could please give the general procedure and a simple example that would be much appreciated.
Since trial division is mostly useless for factoring large numbers, and using a prime number sieve for factoring is just a minor refinement of trial division, you shouldn't think of it as a factorization algorithm. Instead, this is a prime-generating algorithm: the goal is to generate the list of primes in the set $[n] := \{1, 2, 3, \dots, n\}$ as quickly as possible.
We are trying to improve on the sieve of Eratosthenes in efficiency, which does $\Theta(n \cdot \log \log n)$ arithmetical operations on elements of $[n]$.
Wheel factorization does this by using the fact that for the first few primes, the sieve we're constructing is periodic, and there's no point in extending the periodic pattern all the way to $n$. Instead, we only generate the list of numbers not divisible by the first $k$ primes $p_1, p_2, \dots, p_k$ only up to their product $p_1 p_2 \dotsm p_k$. That is, we:
For each extension step, if the set we've generated is $S$ and the next prime we're adding is $p$, then the next set consists of $p$ translated copies of $S$, with $p \cdot S$ removed. For example, if $S = \{1,5\}$ and $p=5$, then we repeat $S$ $5$ times (to get $\{1,5\} \cup \{7,11\} \cup \{13,17\} \cup \{19, 23\} \cup \{25, 29\}$) and remove $5\cdot S = \{5,25\}$. By the way, $p$ is also easy to find: it is the element of $S$ after $1$.
Once $p_1 p_2 \dotsm p_k > n$, we no longer take repeated copies of $S$, and just remove $p \cdot S$ from $S$ to extend. We stop, as with the sieve of Eratosthenes, when $p_k > \sqrt n$. At this point, $S$ contains all the primes larger than $p_k$; the primes smaller than $p_k$ are the ones we used along the way, which we keep track of separately.
According to this paper, this only requires $\Theta(\frac{n}{\log \log n})$ arithmetical operations on elements of $[n]$, if implemented carefully.