Successive intervals bounded by numbers of the form $(6k+1)^2$ contain (approximately) equal numbers of twin primes

56 Views Asked by At

It has frequently been conjectured that intervals bounded by squares of one type or another will always contain one or more twin primes; see for example (1), (2), (3), (4), and (5). I present a heuristic demonstrating that intervals bounded by squares of the form $(6k+1)^2$ should generally contain similar numbers of twin primes in successive intervals. Inductively, it is tempting to conjecture that all such intervals contain twin primes. A conjecture that the described intervals do contain twin primes follows from the fact that the heuristic characterizes the numbers of twin primes in those intervals.

The explication of the heuristic is lengthy. I regard the claim in the title as unusual, so I feel obligated to show adequate work to support that claim as best I can. With regard to the theorems presented, the useful facts are contained in the corollaries. Note especially that the heuristic explicitly identifies intervals containing numbers $m$ such that $6m-1,6m+1$ are twin primes; how this relates to intervals that contain twin primes per se is addressed at the end. I compare the results of the heuristic with actual counts of $m$ in the relevant intervals, for roughly the first $187$ such intervals.

I have a broad question and a narrower question: Broadly, is the overall schema laid out below sound? More narrowly, and more importantly, what mathematical concepts might be brought to bear to devise a proof or a stronger argument that the key assumption that underpins the heuristic is valid?

Preliminaries: $S$ is the set of natural numbers $s>1$ that have no factors of $2$ or $3$. Note in particular $1 \not \in S$. $s \in S \Rightarrow s \equiv \pm 1 \bmod 6$. When the members of $S$ are ordered by size, $s_n=6 \left \lceil \frac{n}{2} \right \rceil + (-1)^n$. $S$ is a semigroup closed under multiplication: $s_i,s_j \in S \land s_i\cdot s_j=k \Rightarrow k \in S$ because two numbers, each having no factors of $2$ or $3$, have a product with no factor of $2$ or $3$, rendering the product in $S$. Conversely, $s_i \in S \land s_i = j\cdot k \land j,k < s_i \Rightarrow j,k \in S$ because a number which has no factors of $2$ or $3$ cannot have proper divisors which themselves have factors of $2$ or $3$. Hereafter, the variable name $s$ connotes $s \in S$, and the variable name $m$ always refers to a number giving rise to twin primes $6m \pm 1$.

$p_n$ is the $n$th prime number. $p_{n \ge 3}\in S$.

$\lfloor x \rceil$ denotes the integer closest to $x$.

Consider composite numbers of the form $n=36m^2-1$. Several facts about $n$ are apparent: $\begin {align} {n \text{ has at least two proper divisors in } S:\ 6m-1,6m+1 \Rightarrow n \in S \land n \not \in \mathbb P \\ s_a \le 6m-1 \mid n \Rightarrow \exists s_b = \frac{n}{s_a} \ge 6m+1 \\ s_i^2 \equiv 1 \bmod 6 \land n \equiv -1 \bmod 6 \Rightarrow n \ne s_i^2 \\ n \text{ is a semiprime } \iff (6m-1,6m+1) \in \mathbb P \iff 6m-1,6m+1 \text { are twin primes} \\ n \text{ is not a semiprime } \Rightarrow n \text{ has at least one factor } s_i<6m-1 } \end {align}$

Theorem 1: If $s_i=6a \pm 1 \le 6m-1 \land s_i \mid 36m^2-1$, then $m \equiv \pm a \bmod s_i$.

Case 1: $s_i=6a-1 \Rightarrow s_j=\frac{36m^2-1}{s_i}=6(a+k)+1$. $$36m^2-1=36a^2+36ak+6a-6a-6k-1 \\ 36m^2=36a^2+36ak-6k$$ Since $36$ divides the LHS, $36$ must divide the RHS, so $k=6n,\ n \ge 0$. $\ n=0$ is formally allowed because $6a+1 > 6a-1$, i.e. $s_i < s_j$ as assumed. Substituting $$m^2=a^2+6an-n \\ m^2-a^2=(m-a)(m+a)=n(6a-1)=n\cdot s_i \\ m \equiv \pm a \bmod s_i$$

Case 2: $s_i=6a+1 \Rightarrow s_j=\frac{36m^2-1}{s_i}=6(a+k)-1$. $$36m^2-1=36a^2+36ak-6a+6a+6k-1 \\ 36m^2=36a^2+36ak+6k$$ Since $36$ divides the LHS, $36$ must divide the RHS, so $k=6n,\ n > 0$. $n=0$ is not formally allowed, because in that case $6a-1 < 6a+1$, i.e $s_j < s_i$, contrary to assumption. Substituting $$m^2=a^2+6an+n \\ m^2-a^2=(m-a)(m+a)=n(6a+1)=n\cdot s_i \\ m \equiv \pm a \bmod s_i$$

Corollary 1.1: $6m-1,6m+1 \in \mathbb P \iff \forall a<m,\ m\not \equiv \pm a \bmod s_i$. Note that this corollary excludes the case $a=m$, for which $m\equiv a$ with respect to any modulus. If $m=a$ (whence $n=0$ and $b=a$) is the only instance where $m \equiv \pm a \bmod s_i$, then $6m-1,6m+1$ are twin primes.

Corollary 1.1 forms the basis for a sieve of positive integers $m$ such that $6m-1,6m+1$ are twin primes. Let $a_i=\left \lfloor \frac{s_i}{6} \right \rceil$. We remove from the natural numbers those having the form $n \cdot s_i \pm a_i$, in sequential steps beginning with $i=1,2,\dots \ $. Numbers which remain are $m$ such that $6m-1,6m+1$ are twin primes. Sieving with $s_i$ in isolation (i.e. not in the context of other sieving steps) leaves a fraction $\frac{s_i-2}{s_i}$ of all numbers unremoved.

Theorem 2: When serially sieving with $s_1,s_2,\dots s_i$, the smallest number first removed by sieving with $s_i$ (i.e. not removed in any previous sieving steps) is $\ge \frac{s_i^2}{6}$.

Sieving with $s_i$ removes numbers of the form $n \cdot (6a_i \pm 1) \pm a_i$. $$n \cdot (6a_i \pm 1) \pm a_i=6a_in \pm n \pm a_i=a_i(6n \pm 1) \pm n$$ This rearrangement makes plain that for values of $6n \pm 1 < 6a_i \pm 1$, earlier rounds will already have sieved out such numbers. A careful analysis requires appropriately keeping track of signs.

Case 1: $s_i=6a_i-1$; $n \cdot (6a_i-1) \pm a_i=6a_in-n \pm a_i=a_i(6n \pm 1)-n$. New numbers $k$ will be identified for removal only when $6n \pm 1 \ge s_i=6a_i-1 \Rightarrow 6n-1 \ge 6a_i-1$, which further entails $n \ge a_i$. Substituting $6a_i-1$ for $6n \pm 1$ and $a_i$ for $n$, $k \ge a_i(6n \pm 1)-n \ge a_i(6a_i-1)-a_i=6a_i^2-2a_i=\frac{s_i^2 -1}{6}$.

Case 2: $s_i=6a_i+1$; $n \cdot (6a_i+1) \pm a_i=6a_in+n \pm a_i=a_i(6n \pm 1)+n$. New numbers $k$ will be identified for removal only when $6n \pm 1 \ge s_i=6a_i+1 \Rightarrow 6n+1 \ge 6a_i+1$, which further entails $n \ge a_i$. Substituting similarly as in Case 1, $k \ge a_i(6n \pm 1)-n \ge a_i(6a_i+1)+a_i=6a_i^2+2a_i=\frac{s_i^2 -1}{6}$.

In both cases, the $i$th sequential step removes numbers not removed in previous steps only if those numbers are $\ge \frac{s_i^2 -1}{6}$. This is not to say that there are no numbers less than $\frac{s_{i+1}^2 -1}{6}$ that are $\equiv \pm a_{i+1} \bmod s_{i+1}$; however, such smaller numbers will necessarily have been identified and sieved out in previous steps.

Corollary 2.1: The $i$th sieving step effectively screens all numbers less than $\frac{s_{i+1}^2 -1}{6}$; in other words, no numbers $<\frac{s_{i+1}^2 -1}{6}$ that remain after the first $i$ sieving steps can be removed in subsequent steps.

Theorem 3: Let $s_i=6a \pm 1;\ s_j=6c \pm 1;\ s_k=6d \pm 1$. If $s_i=s_j \cdot s_k$, then $n\equiv \pm a \bmod s_i \Rightarrow n \equiv \pm c \bmod s_j$.

Case 1: $6a+1=(6c+1)(6d+1) \Rightarrow a=6cd+c+d$ $$n(6a+1) \pm a=n(6c+1)(6d+1) \pm (6cd+c+d)\\ =n(6c+1)(6d+1) \pm (d(6c+1)+c)\\ =(6c+1)(n(6d+1)\pm d)\pm c\\ =n' (6c+1)\pm c$$

Case 2: $6a+1=(6c-1)(6d-1) \Rightarrow a=6cd-c-d$ $$n(6a+1) \pm a=n(6c-1)(6d-1) \pm (6cd-c-d)\\ =n(6c-1)(6d-1) \pm (d(6c-1)-c)\\ =(6c-1)(n(6d-1)\pm d)\pm c\\ =n' (6c-1)\pm c$$

Case 3: $6a-1=(6c-1)(6d+1) \Rightarrow a=6cd+c-d$ $$n(6a-1) \pm a=n(6c-1)(6d+1) \pm (6cd+c-d)\\ =n(6c-1)(6d+1) \pm (d(6c-1)+c)\\ =(6c-1)(n(6d+1)\pm d)\pm c\\ =n' (6c-1)\pm c$$

Case 4: $6a-1=(6c+1)(6d-1) \Rightarrow a=6cd-c+d$ $$n(6a-1) \pm a=n(6c+1)(6d-1) \pm (6cd-c+d)\\ =n(6c+1)(6d-1) \pm (d(6c+1)-c)\\ =(6c+1)(n(6d-1)\pm d)\pm c\\ =n' (6c+1)\pm c$$

In every case, a number $\equiv \pm a \bmod s_i$ is also $\equiv \pm c \bmod s_j$ when $s_j \mid s_i$.

Corollary 3.1: Sieving with composite $s_i$ removes no numbers not already removed by factors of $s_i$.

Corollary 3.2: Sieving with every $p_k \le s_{\text{final}} \in S$ is equivalent to sieving with every $s_i \le s_{\text{final}} \in S$; that is, sieving with any composite $s_i$ is superfluous.

The heuristic: The sieve based on Corollary 1.1 can be performed by sieving with both $s_{2i-1}=6i-1$ and $s_{2i}=6i+1$ as a single unit. In step $n-1$, we will have fully screened every number up to $\frac{(s_{2n-1})^2-1}{6}=6n^2-2n$. After step $n$, we will have fully screened every number up to $\frac{(s_{2(n+1)-1})^2-1}{6}=6(n+1)^2-2(n+1)$. Accordingly, performing the $n$th step screens $6(n+1)^2-2(n+1)-(6n^2-2n)=12n+4$ numbers that may not have been fully screened in previous steps, and from which no numbers will be removed for the first time in subsequent steps. A utility of choosing the interval endpoints in this way is that it affords an algebraically determinate interval size related to the sieving step.

Here is the key assumption: The fraction of numbers remaining after $n-1$ sieving steps is approximately the same in the $n$th interval as it is in the $(n-1)$th interval. That is, if there are $M_{n-1}$ numbers remaining in interval $n-1$ after $n-1$ sieving steps, all of which are suitable $m$, the expected number of candidate values for $m$ in interval $n$ can be approximated as $M_{n-1}\frac{12n+4}{12(n-1)+4}=M_{n-1}\frac{12n+4}{12n-8}$. To be confirmed as actual values of $m$, the candidate values in interval $n$ must be subjected to the $n$th sieving step, which involves screening with $s_{2n-1}$ and $s_{2n}$.

Pursuant to Corollary 3.2, in order to appropriately perform that screening, we must consider whether $s_{2n-1},s_{2n} \in \mathbb P$. Let: $\begin {align} { M_i = \text{ the actual count of $m$ in interval i} \\ N_j = \text{ the expected count of $m$ in interval j} \\ \alpha_n =\text { the multiplier such that }\alpha_n M_{n-1}=N_n \\ r_i= \begin{cases} \frac{s_i-2}{s_i} & \text{ if } s_i \in \mathbb P\\ 1 & \text{ if } s_i \not \in \mathbb P \end{cases} } \end {align}$

Then, $\frac{12n+4}{12n-8} \ge \alpha_n = r_{(2n-1)}r_{(2n)}\frac{12n+4}{12n-8} \ge \frac{12n+4}{12n-8}\frac{6n-3}{6n-1}\frac{6n-1}{6n+1}=\frac{18n^2-3n-3}{18n^2-9n-2}$. Since $\alpha_n$ is strictly between two expressions that tend toward $1$ as $n$ grows without bound, $\alpha_n$ also tends to $1$, but is always slightly $>1$. Accordingly, it is expected that the count of numbers $m$ in the described intervals should grow slowly, but that the ratio of the counts in successive intervals should approach $1$.

Comparison of heuristic results with actual data:

I analyzed data on the first $10000$ values of $m$ listed in this table. The data fell into $187$ intervals. The growth of the count of $m$ is shown in the first chart:

Chart 1

The number of examples per interval grows roughly $20$ to $30$ fold over $187$ intervals, depending on what is assumed to be a representative starting interval. This might not seem like slow growth, but the constraining functions that bound $\alpha$ make reasonable the characterization of that growth as being comparatively slow. $$\prod_{i=1}^{187} \frac{18i^2-3i-3}{18i^2-9i-2} \approx 5.2 \\ \prod_{i=1}^{187} \frac{12i+4}{12i-8} \approx 560$$ In this light, the growth would be expected to be between $5$ times (which represents growth when sieving with every $s_i$) and $560$ times (which represents simply the growth in the interval size, with no sieving at all). In that range of growth, $30$ or less seems modest.

Another measure of the performance of the heuristic is the ratio of examples in successive intervals, shown in the second chart:

Chart 2

Here, although the fluctuation around $1$ is apparent, the extremes of fluctuation become smaller and less frequent as the interval number grows. It seems that there might be some (slow) tendency to converge on $1$; however, much more data would have to be analyzed to confirm that impression. As far as the available data permits a conclusion, the heuristic behaves as if it is valid.

Relationship of the count of $m$ to the count of twin primes: The boundaries of the $n$th interval in which we find $m$ are defined as $\frac{s_{2(n-1)}^2-1}{6}$ and $\frac{s_{2n}^2-1}{6}$. Each $m$ in that interval gives rise to twin primes roughly $6$ times larger, so the boundaries demarking the $n$th interval containing twin primes would be simply $s_{2(n-1)}^2-1$ and $s_{2n}^2-1$. We can be certain that such boundaries themselves do not fall within any twin primes, as the members of $S$ which surround those boundaries are of the form $s_{2i}^2-2,s_{2i}^2$, one of whose members is square and hence composite. Accordingly, it makes no mathematical difference whether the boundary values bracketing an interval is described as $s_{2k}^2-1$ or $s_{2k}^2$. I adopt the latter, which is in fact $(6k+1)^2$ as stated in the title, for simplicity.

1

There are 1 best solutions below

1
On

The number of integers between $(6k+1)^2$ and $(6(k+1)+1)^2=(6k+7)^2$ is asymptotically $72k$. The number of twin primes between $(6k+1)^2$ and $(6k+7)^2$ is conjectured to be asymptotic to $Ck/(\log(6k)^2)^2$, for some positive constant $C$. This is not constant. It goes to infinity with $k$.