I was playing around with ways to visualize prime numbers as the products of smaller primes. Since a prime is always odd, it can be represented as the product of a set of prime numbers plus one.
Below is a quick pyplot visualization of the first 100,000 primes. The x axis represents the x'th prime and the y axis the position of the primes that make up its product-plus-one. it seems to me that there are clear linear features which indicate primes of a certain index are likely to have "products" of another index.
Since I lack the computing power to analyze a larger set of primes, my questions are:
- What are these linear patterns and why do they occur?
- Why do they form with increasing frequency?
- Do the lines continue indefinitely?
- Do more clear lines continue to materialize from the noise indefinitely?
- Can knowing the more likely "factors" of larger primes help predict them?
I apologize if this has been asked before. The closest thing I was able to find was Goldbach's Conjecture, but that deals with the sums of primes.

For any positive integer $k$, it will happen often enough (note: I don't think this is a theorem, but it's a conjecture everyone believes, in the same spirit as the twin prime conjecture) that $p-1$ is $2k$ times a prime. In that case, if you were plotting the primes rather than their indices you would get points like $(p,\frac{p-1}{2k})$, or approximately $(p,\frac p{2k})$, lying approximately on a line of gradient $\frac1{2k}$ through the origin.
Now, instead you are plotting the indices: i.e., a value of $j$ on either axis corresponds to the $j$th prime. Well, the primes are evenly enough distributed that $j=\frac{p_j}{\log p_j}$ is a good approximation.
So now those points on your graph become $(\frac p{\log p},\frac{p/2k}{\log(p/2k)})=(\frac p{\log p},\frac{p/2k}{\log p - \log 2k})$. A super-crude approximation would say that those points are roughly $(x,\frac x{2k})$ where $x=\frac p{\log p}$, but $\log p$ isn't all that large and $\log 2k$ isn't all that small. So, instead, note that $\log p\approx \log x$ and write our points as $(x,\frac1{2k}\frac1{1-\frac{\log 2k}{\log x}})$.
Now, that isn't a straight line -- but note that for most of the range we're looking at here $\log x$ isn't very different from $\log x_{\max}$ where $x_{\max}$ is the upper limit of the plot's $x$-coordinates. In fact, on average it's about $(\log x_\max)-1$. So, finally, write $g(k)=\frac1{2k}\frac1{1-\frac{\log 2k}{\log x_{\max}-1}}$; our plot consists approximately of lines through the origin with gradients $g(1),g(2),g(3),\dots$
In your graph shown above, $x_\max=10^5$. Then the first few values of $g$ turn out to be approximately g(1)=0.535, g(2)=0.288, g(3)=0.201, g(4)=0.156. We should therefore expect to see the top four $y$-values at $x=20000$ being 10700, 5760, 4000, 3120. We are a little further left than $x_max/e$, so these will be underestimates. ... And, indeed, they're pretty close but a little too low.
So, to answer your questions: