Abundance of prime k-tuples

335 Views Asked by At

I have been trying to find out whether the following is a known result and would be grateful for an answer:

For any two natural numbers x and y there must be a prime k-tuple (a, b, ...) corresponding to x consecutive primes (n+a, n+b, ...) for at least y different n.

(The definition of prime k-tuple I'm using is the one given here: https://en.wikipedia.org/wiki/Prime_k-tuple)

I am asking because I believe I have a proof of this and would like to know whether it's worth sharing here.

PS: an example as requested: let x = 3 and y = 2. In this case I am looking for a k-tuple with 3 elements that at least 2 times corresponds to three consecutive primes. The k-tuple (0, 2, 6) satisfies this condition, since it results in three consecutive primes for at least 2 values of n, e.g. 5 (5, 7, 11) and 11 (11, 13, 17). My above statement says that this works for any x and y.

2

There are 2 best solutions below

2
On

This is my attempt to prove the above statement. I believe it contains no significant gaps/errors. It is written in terms of prime patterns rather than k-tuples because I didn't realise that k-tuple is the more common term at the time of writing. The glossary relates it to k-tuples. For anyone worried by my choice of notation, neither $S_A$ nor $S_S$ has anything to do with my political views. I should also mention that I am not a professional mathematician.

A theorem on the abundance of prime patterns

Glossary of frequently used terms:

(Prime) gap: difference between two successive primes; a kind of (prime) pattern

(Prime) pattern: sequence of gaps of a defined span in a defined order; corresponds to a single prime k-tuple each whose instances consists of successive primes

(Prime) k-tuple: a finite set of naturals (a, b, ...) such that there is a non-empty set of integers n for which (n+a, n+b, ...) are prime.

Instance (of a pattern/k-tuple): The set of primes (n+a, n+b ...) for a single n.

Span (S) (of a gap/pattern): difference between the highest and lowest prime in the gap or pattern

Size (X) (of a gap/pattern): number of primes in said pattern.

Abundance (A) (of a pattern/span): number of instances of that pattern in a given range of the naturals

(Un)populated (pattern/span): one with (no) instances in a given range of the naturals

Abundance distribution: a function from patterns of a given size to their abundance, where the patterns are ordered by their span

Theorem: There are prime patterns of any finite size and abundance. More formally: for any two natural numbers x and y there must be a prime k-tuple (a, b, ...) corresponding to x consecutive primes (n+a, n+b, ...) for at least y different n.

Approach:

The proof attempted here is split into two parts:

1: find a lower bound for the abundance of the most abundant pattern of a given size up to some natural N. This is the main part of the proof and consists of the following steps:

1.1: using the prime number theorem, derive an estimate of the average span $S_A$ of all instances of all patterns of a given size X up to a given natural N.

1.2: consider a “rectangular” abundance distribution for these patterns, i.e.: i.) only patterns up to a certain largest populated span $S_L$ occur up to N. ii.) All patterns up to $S_L$ occur with almost equal abundance $A_R$. ("Almost equal" is defined precisely in the corresponding section of the proof.) Show that any abundance distribution up to N with the same average span $S_A$ must contain patterns of at least abundance $A_R$. It follows that the abundance distribution of the patterns of size X in the primes up to N must contain patterns of at least abundance $A_R$.

2: show that this lower bound ($A_R$) tends to infinity as N does.

2.1 Using $S_A$, derive an estimate for $S_L$ and thereby for $A_R$.

2.2 Show that $A_R$ tends to infinity as N does.

Proof:

1.1 To estimate the average span $S_A$, restate it as follows and find estimates for its numerator and denominator.

$$S_A = \frac{\sum S_{X,N}}{\sum A_{X,N}}$$

The denominator here is the sum of the abundances of all patterns of size X up to N, where “up to” means that all primes in the instances considered are ≤ N. The numerator is the sum of the spans of all these instances up to N.

Since all but the last X-1 primes up to N each are the smallest prime in precisely one instance of one such pattern and together account for all of them, this sum is π(N)-X+1. For large N, the ratio of this value to π(N) and thus to $\frac{N}{logN}$ approaches 1. So:

$$\sum A_{X,N} \sim \pi(N) \sim \frac {N}{logN}$$

To estimate the numerator of $S_A$, first persuade yourself that every gap up to N except for the first X-2 and last X-2 gaps enters into that sum X-1 times and that within those outer X-2 gaps at either end the nth gap from the end occurs n times. Thus, for large N (that is π(N)>>X), most gaps occur X-1 times in the sum.

The sum of all gaps up to N is precisely $p_{π(N)}-2$ (where $p_{π(N)}$ is the largest prime ≤ N) and the ratio of this to N approaches 1 for large N, so the sum of all prime gaps up to N is roughly N. With the above estimate that most gaps occur X-1 times in the numerator, this suggests that $S_{X,N}$ is roughly N(X-1). For this to be true, the error resulting by the outer gaps occurring less than X-1 times in the numerator must be negligible.

To see that that is indeed the case, recall the prime number theorem: if π(N) is the number of primes up to N, it states that:

$$\lim_{n\to \infty} \frac{\pi(N)}{(\frac{N}{logN})} = 1$$

It has been proven from this that, as the primes approach infinity, the ratio of two successive primes tends to 1, which implies that the ratio of the gap between them to either of them tends to 0. It follows that the ratio of the sum of the last X-2 gaps to the sum of all gaps up to N tends to 0, so that the outer gaps' under-representation in the numerator is negligible for large N. The same is true for the first X-2 gaps as their spans are fixed for a given X, being the gaps between the first X-1 primes, while N tends to infinity. So:

$$\sum S_{X,N} \sim (X-1)N$$

The average span for large N can therefore be estimated as:

$$S_A \sim \frac{(X-1)N}{(\frac {N}{logN})} = (X-1)logN$$

More formally:

$$ \lim_{N\to \infty} \frac {S_A}{(X-1)logN} = 1$$

1.2 Here, for the sake of argument, consider the abundance distribution to be a function with changeable properties, while knowing that it is in fact fixed when applied to all instances of all patterns of size X up to N.

At first, it pays to tidy up the definition of the “rectangular” abundance distribution introduced in the approach. The “almost” in condition ii.) stems from the fact that it is not generally possible for all patterns to have equal abundance, since the number of patterns up to $S_L$ can not be expected to divide the sum of all their instances. It is, however, possible for the abundances of all patterns to be within 1 of each other, meaning that some have an abundance $A_R^*$ and others may have abundance $A_R^*+1$, where $A_R^*+1$ > $A_R$ and $A_R^*$$A_R$.

To transform this rectangular distribution into any other distribution with the same N, X and average span $S_A$, it is sufficient to move abundance from the patterns that have more than their target number of instances to those that have less, while ensuring that the overall process doesn't increase $S_A$. “Moving abundance” from one pattern to another here simply means replacing that many instances of one pattern by that many of the other.

The process can be thought of as consisting of at most two steps:

1.) Moving abundance to spans beyond the rectangle (>$S_L$) from those in it (≤$S_L$). This may or may not take place, depending on the target distribution. If it does, it increases the total sum of pattern spans and therefore their average, $S_A$, as it consists entirely of replacing patterns by others with larger spans.

2.) Rearranging abundance within the spans already populated by the rectangle. This certainly takes place for any non-trivial rearrangment and, to keep $S_A$ constant, must compensate for any increase from step 1.

The key to part 1 of the proof is to show that this can not be done while lowering the abundance of the most common pattern to less than $A_R^*$.

To see this, consider the following limiting case:

The desired reduction of abundance for all patterns in the rectangle to some value below $A_R^*$ has been performed in such a way as to achieve the smallest possible increase of $S_A$. To guarantee this, all the excess abundance has been moved to patterns of the smallest unpopulated span $S_S$. On the other hand, the rearrangement of abundance within the spans populated by the rectangle has been performed so as to achieve the greatest possible decrease in the sum of all spans. To guarantee this, abundance is taken only from the largest populated span $S_L$ (and, if it's population reaches zero, from the second-largest span, etc.) and shifted to all other spans until these reach an abundance of $A_R^*-1$. Any further increase of any of these other patterns would raise one pattern's abundance back to $A_R$, defeating the purpose of lowering the abundance of the most common pattern to below that value.

Now one can pair off each transfer of one instance from a pattern of at most span $S_L$ to pattern of lower span with the transfer of one instance from that lower span to $S_S$. And each such pairing has a net positive effect on the sum, as it amounts to replacing a pattern of at most span $S_L$ by one of span $S_S$, which exceeds $S_L$, being $S_L+2$.

At the same time, all spans in the rectangle had an initial abundance of $A_R^*$ or $A_R^*+1$ but have been repopulated only to $A_R^*-1$ (as explained above). Therefore, the increase of $S_A$ through all the tranfers of abundance to $S_S$ that made the difference between $A_R^*(+1)$ and $A_R-1$ is entirely uncompensated for.

So the full process can be composed entirely of steps that increase the sum of all spans and therefore their average, $S_A$. So it must do so itself.

(If the capacity of $S_S$ and/or $S_L$ is insufficient for the limiting case shown above, that strengthens the argument: the increase is then larger (if larger spans than $S_S$ are populated) and/or the decrease smaller (as smaller spans than $S_L$ are depopulated).)

2.1 To derive an estimate for $A_R$, formulate it as a quotient. The numerator is the sum of the abundances of all populated patterns of size X up to N. The denominator is the number of these patterns:

$$A_R = \frac{\sum A_{X,N}}{n_{X,N}}$$

The sum of the abundances of all patterns has already been estimated as the denominator of $S_A$, and approaches π(N) and $\frac {N}{logN}$ for large N (see equation 2).

The number of populated patterns in the rectangular distribution, $n_{X,N}$, is a little tricky to estimate but constrained by the average span $S_A$. Some of following estimates are marginally inaccurate, because some patterns usually have abundance $A_R^*$ while others have $A_R^*+1$. However, for large N, the abundance of all patterns in the rectangular distribution tends to infinity, so $A_R^* \sim A_R^*+1$ and this error becomes negligible. Also, there are infinitely many N for which the number of patterns of size X divides the number of their instances, so that $A_R^*=A_R$. For these, the following statements are entirely accurate, which is sufficient to prove the theorem.

Firstly, there are at most $\frac{S}{2}$ possible spans up to any span S. The spans are evenly spaced, being the even numbers, except for those including the gap 1 between 2 and 3. To sidestep that, take this argument to be only about primes and prime patterns above 2.

Therefore, were there equally many patterns of each span (as is the case for pattern size X=2 with just 1 each), a rectangular distribution with $S_L>2S_A$ would have an average span above $S_A$. Since this is precluded, there can only be spans up to $2S_A$ for X=2 in the rectangular distribution. The average span then lies at half the maximum span.

For pattern sizes X>2, the number of possible (including inadmissible) patterns of a size X and span S is the combination number C($\frac{S}{2}$-1,X-2), because X-2 primes occupy some of the ($\frac{S}{2}$)-1 possible positions between the largest and smallest prime in the instances of any such pattern. For a given X, this number grows as S does, so larger spans have more patterns. So in terms of pattern spans, rectangular distributions for X>2 are “top-heavy”, i.e. larger spans are more populated, meaning that $S_L≤2S_A$. So $2S_A$ can be used as an estimate (X=2) or upper bound (X>2) for $S_L$.

The number of possible (including inadmissible) patterns of size X with spans up to some value 2S is C(S,X-1), because X-1 primes occupy some of the S possible positions above the smallest prime in the instances of any such pattern. C(S,X-1) is certainly less than $S^X$. So the number of possible patterns of size X with spans up to $2S_A$ (and therefore up to $S_L$) is less than $(S_A)^X$. So:

$$n_{X,N} < (S_A)^X$$

It follows that:

$$A_R > \frac{(\frac{N}{logN})}{(S_A)^X}$$

which for large N (using equation 5) tends to:

$$ \frac{N}{(X-1)^X(logN)^{X+1}}$$

2.2 To show that this tends to infinity as N does is the easiest part. Given that $(X-1)^X$ is a constant for a given size X, we need only consider whether $\frac {N}{(logN)^X}$ tends to infinity for a given X. This is the case. Therefore $A_R$ (and with it $A_R^*$, which is within 1 of $A_R$) tends to infinity with N.

0
On

Update: the short answer to my question seems to be yes. Thank you, Ville Salo, for pointing me (in a comment and answer in Math Overflow) to theorem 1.2 of James Maynard's work, which seems to imply my result.