What is the density of consecutive primes which have gaps too small or too large from the average ?

74 Views Asked by Bumbble Comm At 26 Mar 2026 - 4:56

Let $p_i$ denote the $i$-th prime number and $g_i$ the $i$-th gap $g_i=p_{i+1}-p_{i}$.

By the PNT we know that the average gap is $g_i \approx \ln p_i$.

A gap $g_i$ is considered "too small" iff $\frac{g_i}{\ln p_i} \leq \frac{1}{2}$ and is considered "too large" iff $\frac{g_i}{\ln p_i} \geq 2$

Now i want to know the $\#$ of gaps that are "too small or too large", this is defined below :

Let $T(i) = 1$ iff $\frac{g_i}{\ln p_i} \leq \frac{1}{2}$ or $\frac{g_i}{\ln p_i} \geq 2$.

Define $\#(n) = \frac{1}{n}\sum \limits_{i=1}^{n} T(i)$

For instance :

$\#(10)=0.1$ , $\#(100) = 0.27$, $\#(1000) =0.354 $, $\#(10000) =0.3474 $.

Now its trivial to bound $0 \leq \#(n) \leq 1$, but is there a better lower and upper bound ?

Edit : one can assume R.H. or any conjecture in number theory that could help make the bounds stricter even if the conjecture is not solved.

Thanks in advance.

There are 1 best solutions below

Bumbble Comm On 22 Sep 2017 - 12:01 BEST ANSWER

For clarity I put my (corrected) comments here :

I doubt there are many useful proven results.
I would look at the random model (*) underlying Cramer's conjecture : $X_n=1_{n \text{ is prime}}$ is a sequence of independent random variables with $P[X_n=1]=\frac{1}{\log n}$ so that $G(k)$ the $k$-th gap follows approximately a geometric distribution of parameter $1−\frac{1}{\log k}$. $({}^*)$ of course this is just a model, not the truth)

We obtain that the mean and standard deviation of $G(k)$ is $\log k$, thus about $\text{erf}(1/\sqrt{2}) \approx 0.68$ of gaps are within $[\frac{\log k}{2},\frac32 \log k]$ (under this model).
The PNT almost doesn't prove anything about gaps (they could all fall in $[0,200]$ or $[o(\frac{k}{\log^m k}),2\, o(\frac{k}{\log^m k})]$) and the RH only improves $o(\frac{k}{\log^m k})$ to $o(k^{1/2+\epsilon})$
About the series we are interested in $\sum_{k=1}^\infty (-1)^k \frac{k}{p_k}=\sum_k \frac{k g_{k}-p_k}{p_kp_{k+1}}$ I'd say its convergence depends on how fast the observed values of $\frac{g_k-\log k}{ \log k}$ converges in distribution to a centered Gaussian of variance $\sigma$ (the model predicts $\sigma=1$)