I'm reading Analytic Number theory from Tom. M. Apostol's Introduction to Analytic Number Theory. In the fourth Chapter of the book he proves the equivalence of Prime number theorem with the asymptotic average of the Mobius function being zero. i.e $$\lim_\limits{x \to \infty} \frac1x \sum_\limits{n\leq x} \mu(n)=0 \iff \lim_\limits{x \to \infty} \frac{\pi(x)\log(x)}{x}=1 $$ I understand the formal working of the proof, but this beautiful statement seems to be unmotivated and out of the blue in the presentation of the text.
It would be great if someone can explain me what led mathematicians to believe that the above statement is equivalent to the Prime number theorem.
There is an interesting history regarding the development of this equivalence. Tom M. Apostol addresses in the beginning of ch. 4 so-called elementary proofs of the prime number theorem which use methods from real analysis and number theory only in contrast to analytic proofs which are mainly based upon methods from complex analysis.
Up to the twenties of last century mathematicians were not sure, if an elementary proof can be found. The following quote is from
It was 1949 when A. Selberg and P. Erdős discovered an elementary proof. As indicated above elementary is far from being simple. It just addresses the type of used techniques.
Some words how the Möbius function comes into play. Recall an arithmetic function $a: \mathbb{N}\to\mathbb{C}$ is called completely multiplicative if
\begin{align*} a(n)a(m)=a(nm)\qquad\qquad\text{ for all }m,n\in\mathbb{N} \end{align*} Let $\mathbb{P}$ denote the set of primes. The following theorem holds:
We consider the reciprocal of (1): \begin{align*} \frac{1}{\sum_{n=1}^\infty a(n)}=\prod_{p\in\mathbb{P}}(1-a(p))\tag{2} \end{align*} and are looking for a series representation of (2). Denoting with $\mathbb{P}[N]$ the set of primes less or equal a real number $N$ we look at first at the finite product and try to derive a series representation for \begin{align*} \prod_{p\in\mathbb{P}[N]}(1-a(p))\tag{3} \end{align*} We see from (3) that $a(1)=1$ is a term of the series. All other non-zero terms of the series come from products $(-1)^ka(p_1)a(p_2)\cdots a(p_k)$ with $k$ pairwise different primes $p_j, 1\leq j\leq k$. Since $a$ is completely multplicative, the non-zero terms besides $a(1)$ have a representation \begin{align*} (-1)^ka(p_1)a(p_2)\cdots a(p_k)=(-1)^ka(p_1p_2\cdots p_k) \end{align*}
Note:
The answer given here is mainly taken from the thesis The prime number theorem: Analytic and elementary proofs by Ciarán O’Rourke which is worth reading.
The history of the elementary proof by A. Selberg and P. Erdős is presented in the elementary proof of the prime number theorem: An historical perspective by D. Goldfeld from which G. H. Hardy's quote was taken.