I'm reading Prime Obsession by John Derbyshire, and quite central to the thesis of the book is the PNT. I understand the concept of the logarithmic distribution of primes (i.e. the probability that a number in the neighborhood of $x$ is prime is approximately $\frac{1}{\log x}$). If you think of this as a "prime density function," then we can say that $\pi(x) \sim \frac{x}{\log x}$.
Now comes $\text{Li}(x) = \int_0^x \frac{1}{logt} dt$. Derbyshire says on page 115 that "[$\text{Li}(x)$] is a much better estimate" of $\pi(x)$ than $\frac{x}{logx}$.
On a conceptual level, why is this true? I have a feeling that it's kind of like how work is better "estimated" by $\int_0^d F(t) dt$ than by $F * d$, but I can't put my finger on exactly what's going on (what would the analog of $F$ be in this context? How does this explain the fact that $\frac{x}{\log x}$ is always an underestimate? etc.).
I have an understanding of calculus that goes up to introductory multivariable calculus, but both a layman-ish explanation and a rigorous/mathematical explanation would be appreciated.
The key idea is the abscissa of convergence of Dirichlet series and Mellin transforms.
$\zeta(s) = s\int_1^\infty \lfloor x \rfloor x^{-s-1}dx =\frac{s}{s-1}+s\int_1^\infty (\lfloor x \rfloor-x) x^{-s-1}dx$ thus $\log (\zeta(s)(s-1))$ is analytic at $s=1$.
Let $\Pi(x) = \sum_{p^k \le x} \frac{1}{k}= \pi(x)+\mathcal{O}(x^{1/2})$ then the Euler product and Abel summation formula give $\log \zeta(s) = s \int_2^\infty \Pi(x) x^{-s-1}dx$
Integrating by parts shows $s\int_2^\infty \text{Li}(x)x^{-s-1}dx+\log(s-1)$ is analytic at $s=1$, where $\text{Li}(x) = \int_2^x \frac{t}{\log t}$
$s\int_2^\infty \frac{x}{\log x}x^{-s-1}dx+\log(s-1)$ is bounded but not analytic at $s=1$.
$\int_2^\infty (\Pi(x)-\text{Li}(x))x^{-s-1}$ is analytic at $s=1$. Together with the zero-free region $\sigma > 1-\frac{A}{1+|\log t|}$ and the estimate $\log (\zeta(s)(s-1)) = \mathcal{O}(\log^k |t|)$ and Mellin inversion, it gives the prime number theorem : for every $m$, $\Pi(x)-\text{Li}(x) = o(\frac{x}{\log^m x})$.
Under the Riemann hypothesis $\Pi(x)-\text{Li}(x) = \mathcal{O}(x^{1/2}\log x)$.
With $\frac{x}{\log x}$ you'll only get $\Pi(x)-\frac{x}{\log x} = \mathcal{O}(\frac{x}{\log^2 x})$.