Formula for Mertens Function $M\left(x\right)$ in terms of the Prime counting function $\pi\left(x\right)$ using inclusion-exclusion

120 Views Asked by At

In this post I was interested in studying the behaviour of Mertens function $M\left(n\right)$, which is defined for all positive integers as

$$M\left(n\right)=\sum_{k=1}^{n}\mu\left(k\right)$$

Less formally, $M\left(x\right)$ is the count of square-free integers up to $x$ that have an even number of prime factors, minus the count of those that have an odd number. Therefore, it made sense to me trying to reformulate $M\left(x\right)$ in terms of the Prime Counting Function $\pi\left(x\right)$ using the inclusion-exclusion principle.

In that post, I realized that I did not have a correct formulation. Now, I think that I have derived correctly such formulation, so I post it here to check if it is correctly derived.

Moving forward, we denote as $\pi_{k}\left(n\right)$ to the number of positive integers equal or less than $n$ which are divisible by the product of $k$ distinct prime factors; for convention, $\pi_{1}\left(n\right)$ is denoted as $\pi\left(n\right)$.

Firstly, they can be counted those integers less than $n$ which are divisible only by one prime factor; that is, prime numbers equal or less than $n$, which can be counted directly with the prime counting function as being $\pi\left(n\right)$.

Secondly, to count the square-free integers less than $n$ that are divisible by two different prime factors, note that at least one of them must be less than $\sqrt{n}$; subsequently, for each $p_{i}<\sqrt{n}$, we can count the prime numbers less than $\lfloor\frac{n}{p_{i}}\rfloor$. We can denote this counting as $\sum_{p_{i}<\sqrt{n}}\pi\left(\frac{n}{p_{i}}\right)$ , and each $\pi\left(\frac{n}{p_{i}}\right)$ as as the “ith counting”. Note that in each $i$ counting we are including the non-square-free integers divisible by the ith prime factor, as well as some square-free integers already counted.

For instance, let $n=50$. As $7<\sqrt{50}$, then $\sum_{p_{i}<\sqrt{n}}\pi\left(\lfloor\frac{n}{p_{i}}\rfloor\right)=\pi\left(\lfloor\frac{50}{2}\rfloor\right)+\pi\left(\lfloor\frac{50}{3}\rfloor\right)+\pi\left(\lfloor\frac{50}{5}\rfloor\right)+\pi\left(\lfloor\frac{50}{7}\rfloor\right)$. With the term $\pi\left(\lfloor\frac{50}{2}\rfloor\right)=9$, we are counting the integers $\left\{ 2*2,2*3,2*5,2*7,2*11,2*13,2*17,2*19,2*23\right\}$ . As we are counting the non-square-free integer $2*2$, we would need to “discount” it. With the term $\pi\left(\lfloor\frac{50}{3}\rfloor\right)=6$, we are counting the integers $\left\{ 3*2,3*3,3*5,3*7,3*11,3*13\right\}$ . As we are counting the non-square-free integer $3*3$, and the square free integer $3*2$, which was already counted, we would need to “discount” both.

Note that for each “ith counting” we need to discount all the already discounted square-free integers, plus the non-square-free integer divisible by the ith factor, plus an additional square free integer which was counted in the “(i-1)th” counting. Therefore, we need to discount in total $\sum_{i=1}^{\pi\left(\sqrt{n}\right)}i$ integers that were counted more than once. Or which is the same, we need to discount $\left(\frac{\pi\left(\sqrt{n}\right)^{2}+\pi\left(\sqrt{n}\right)}{2}\right)$ integers.

Thus, at the end we have that we can count the square-free integers less than $n$ which are divisible by two different prime factors as

$$\left(\sum_{p_{i}<\sqrt{n}}\pi\left(\lfloor\frac{n}{p_{i}}\rfloor\right)\right)-\left(\frac{\pi\left(\sqrt{n}\right)^{2}+\pi\left(\sqrt{n}\right)}{2}\right)$$

Thirdly, to count the square-free integers less than n which are divisible by three different prime factors, note that the product of two of them must be less than $\sqrt{n}$; subsequently, for each product $p_{i}p_{j}<\sqrt{n}$, we can count the prime numbers less than $\frac{n}{p_{i}p_{j}}$. We can denote this counting as $\sum_{p_{i}p_{j}<\sqrt{n}}\pi\left(\lfloor\frac{n}{p_{i}p_{j}}\rfloor\right)$ , and each $\pi\left(\lfloor\frac{n}{p_{i}p_{j}}\rfloor\right)$ as the “ith counting”. Note that in each counting we are including the non-square free integers divisible by the ith prime factor, as well as some square free integers already counted.

For instance, let $n=100$. We have that $\sum_{p_{i}p_{j}<\sqrt{n}}\pi\left(\lfloor\frac{n}{p_{i}p_{j}}\rfloor\right)=\pi\left(\lfloor\frac{100}{2*3}\rfloor\right)+\pi\left(\lfloor\frac{100}{2*5}\rfloor\right)$. With the term $\pi\left(\lfloor\frac{100}{2*3}\rfloor\right)=6$, we are counting the integers $\left\{ 2*3*2,2*3*3,2*3*5,2*3*7,2*13*11,2*3*13\right\}$ . As we are counting the non-square-free integers $2*2*3$ and $2*3*3$, we would need to “discount” the two of them. With the term $\pi\left(\lfloor\frac{100}{2*5}\rfloor\right)=4$, we are counting the integers $\left\{ 2*5*2,2*5*3,2*5*5,2*5*7\right\}$ . As we are counting the non-square free integers $2*2*5$ and $2*5*5$, and the square free integer $2*5*3$, which was already counted, we would need to “discount” the three of them.

Note that for each “ith counting” we need to discount all the already discounted square free integers, plus the non-square-free integer divisible by any of the ith two factors, plus an additional square free integer which was counted in the “(i-1)th” counting. Therefore, we need to discount in total $\sum_{i=1}^{\pi_{2}\left(\sqrt{n}\right)}i+1$ integers that were counted more than once. Or which is the same, we need to discount $\left(\frac{\pi_{2}\left(\sqrt{n}\right)^{2}+3\pi_{2}\left(\sqrt{n}\right)}{2}\right)$ integers.

Thus, at the end we have that we can count the square-free integers less than $n$ which are divisible by three different prime factors as

$$\left(\sum_{p_{i}p_{j}<\sqrt{n}}\pi\left(\lfloor\frac{n}{p_{i}p_{j}}\rfloor\right)\right)-\left(\frac{\pi_{2}\left(\sqrt{n}\right)^{2}+3\pi_{2}\left(\sqrt{n}\right)}{2}\right)$$

In general, it can be seen that, to count the square-free integers less than $n$ which are divisible by $k$ different prime factors, the product of $k-1$ of them must be less than $\sqrt{n}$; subsequently, for each product $\prod_{k-1}p_{i}<\sqrt{n}$, we can count the prime numbers less than $\lfloor\frac{n}{\prod_{k-1}p_{i}}\rfloor$; and we need to discount in total $\sum_{i=1}^{\pi_{k-1}\left(\sqrt{n}\right)}i+k-2$ integers that were counted more than once. Thus, at the end we have that we can count the square-free integers less than $n$ which are divisible by $k$ different prime factors as

$$\left(\sum_{\prod_{k-1}p_{i}<\sqrt{n}}\pi\left(\lfloor\frac{n}{\prod_{k-1}p_{i}}\rfloor\right)\right)-\left(\frac{\pi_{k-1}\left(\sqrt{n}\right)^{2}+\left(2(k-1)-1\right)\pi_{k-1}\left(\sqrt{n}\right)}{2}\right)$$

Therefore, applying the inclusion-exclusion principle, we can reformulate $M\left(x\right)$ in terms of the Prime Counting Function $\pi\left(x\right)$ as:

$$M\left(n\right)=-\pi\left(n\right)+\left(\left(\sum_{p_{i}<\sqrt{n}}\pi\left(\lfloor\frac{n}{p_{i}}\rfloor\right)\right)-\left(\frac{\pi\left(\sqrt{n}\right)^{2}+\pi\left(\sqrt{n}\right)}{2}\right)\right)-\left(\left(\sum_{p_{i}p_{j}<\sqrt{n}}\pi\left(\lfloor\frac{n}{p_{i}p_{j}}\rfloor\right)\right)-\left(\frac{\pi_{2}\left(\sqrt{n}\right)^{2}+3\pi_{2}\left(\sqrt{n}\right)}{2}\right)\right)+...$$

Or, in a more compressed form,

$$M\left(n\right)=-\pi\left(n\right)+\sum_{i=1}^{k-1}\left(-1\right)^{^{i+1}}\left(\left(\sum_{\prod_{i}p_{j}<\sqrt{n}}\pi\left(\lfloor\frac{n}{\prod_{i}p_{j}}\rfloor\right)\right)-\left(\frac{\pi_{i}\left(\sqrt{n}\right)^{2}+\left(2i-1\right)\pi_{i}\left(\sqrt{n}\right)}{2}\right)\right)$$

I think that this formulation is interesting, because the terms $-\pi\left(n\right)+\sum_{i=1}^{k-1}\left(-1\right)^{^{i+1}}\left(\sum_{\prod_{i}p_{j}<\sqrt{n}}\pi\left(\lfloor\frac{n}{\prod_{i}p_{j}}\rfloor\right)\right)$ ressemble a lot Legendre's Formula, and maybe the terms $\sum_{i=1}^{k-1}\left(-1\right)^{^{i+1}}\left(-\frac{\pi_{i}\left(\sqrt{n}\right)^{2}+\left(2i-1\right)\pi_{i}\left(\sqrt{n}\right)}{2}\right)$ could be not so difficult to bound. Maybe enough to show that for any $\epsilon>0$ there is $N_0$ such that any $n>N_0$ verifies $M(n)≤n^{1/2+\epsilon}$... but that exceeds my current knowledge.

Thanks in advance!

EDIT

After reviewing the process leading to the formula proposed in the OP, I realized that there exists another formulation that might be even more tractable for the purpose of bounding $M\left(x\right)$. This formulation is as follows: $$-\pi\left(n\right)+\left(\sum_{p_{i}<\sqrt{n}}\pi\left(\lfloor\frac{n}{p_{i}}\rfloor\right)\right)-\left(\sum_{p_{i}<p_{j}<\sqrt[3]{n}}\pi\left(\lfloor\frac{n}{p_{i}p_{j}}\rfloor\right)\right)+\left(\sum_{p_{i}<p_{j}<p_{k}<\sqrt[4]{n}}\pi\left(\lfloor\frac{n}{p_{i}p_{j}p_{k}}\rfloor\right)\right)-\dots-\left(\frac{\pi\left(\sqrt{n}\right)^{2}+\pi\left(\sqrt{n}\right)}{2}\right)+\sum_{i=3}^{j}\left(\left(-1\right)^{i+1}\left(\frac{\left(\pi\left(\sqrt[i]{n}\right)-\left(i-2\right)\right)\left(\pi\left(\sqrt[i]{n}\right)-\left(i-3\right)\right)\left(2\left(\pi\left(\sqrt[i]{n}\right)-\left(i-2\right)\right)+3\left(i-2\right)+1\right)}{6}\right)\right)$$

Where $j=\max\{i\mid \pi\left(\sqrt[i]{n}\right)>i-2\}$

The advantages of this formulation are that (i) the use of $\pi_k\left(n\right)$ is avoided, (ii) the terms $-\pi\left(n\right)+\left(\sum_{p_{i}<\sqrt{n}}\pi\left(\lfloor\frac{n}{p_{i}}\rfloor\right)\right)-\left(\sum_{p_{i}<p_{j}<\sqrt[3]{n}}\pi\left(\lfloor\frac{n}{p_{i}p_{j}}\rfloor\right)\right)+\left(\sum_{p_{i}<p_{j}<p_{k}<\sqrt[4]{n}}\pi\left(\lfloor\frac{n}{p_{i}p_{j}p_{k}}\rfloor\right)\right)-\dots$ are even more similar to Legendre's Formula, and (iii) the terms $-\left(\frac{\pi\left(\sqrt{n}\right)^{2}+\pi\left(\sqrt{n}\right)}{2}\right)+\sum_{i=3}^{j}\left(\left(-1\right)^{i+1}\left(\frac{\left(\pi\left(\sqrt[i]{n}\right)-\left(i-2\right)\right)\left(\pi\left(\sqrt[i]{n}\right)-\left(i-3\right)\right)\left(2\left(\pi\left(\sqrt[i]{n}\right)-\left(i-2\right)\right)+3\left(i-2\right)+1\right)}{6}\right)\right)$ are more tractable having avoided the use of $\pi_k\left(n\right)$.

EDIT 2

After analyzing the flaws detected by @StevenClark in this MO post, I have found a way to fix them, but the formula changes are substantial. Here it goes:

$$M(n)=1-\pi\left(n\right)+\sum_{p_{i}\leq\frac{n}{p_i}}\left(\pi\left(\lfloor\frac{n}{p_{i}}\rfloor\right)-i\right)-\sum_{p_{i}<p_{j}\leq\frac{n}{p_ip_j}}\left(\pi\left(\lfloor\frac{n}{p_{i}p_{j}}\rfloor\right)-j\right)+\sum_{p_{i}<p_{j}<p_{k}\leq\frac{n}{p_ip_jp_k}}\left(\pi\left(\lfloor\frac{n}{p_{i}p_{j}p_{k}}\rfloor\right)-k\right)-\dots$$

Indeed, the formula looks cleaner and even more beautiful, but does not seem to be tractable in terms of binding $M(n)$.