I am trying to show that, $$\pi_2(x)\sim\frac{x\ln(\ln (x))}{\ln (x)}$$ where $\pi_2(x)$ is defined as the number of integers which are the product of two distinct primes.
Qiaochu Yuan posted this answer: https://math.stackexchange.com/a/1619822/987127, but I don't see where he got the first approximation from.
This was my attempt, which I am a factor of 2 off. I am hoping someone could help me fix it.
We know,
- (P.N.T) $\pi(x)\sim\dfrac{x}{\log x}$
- $2\pi_2(x)=\left(\displaystyle\sum_{p\leq x}\pi\left(\dfrac{x}{p}\right)\right)-\pi(\sqrt{x})$
- (Merten's) $\displaystyle\sum_{p\leq x} \frac{1}{p}=\ln(\ln( x))+B+\mathcal{O}\left(\frac{1}{\ln(x)}\right)$
Therefore, $$2\pi_2(x)=\left(\displaystyle\sum_{p\leq x}\pi\left(\dfrac{x}{p}\right)\right)-\pi(\sqrt{x})\sim \frac{x}{\ln(x)}\sum_{p\leq x} \frac{1}{p}-\frac{2\sqrt{x}}{\ln(x)}\sim \frac{x\ln(\ln(x))}{\ln(x)}.$$
The problem comes from the approximation $$ \pi(n/p) \sim \frac{n/p}{\log(n/p)} \approx \frac{n}{p \log n}. \tag{1}$$ For large $p$ relative to $n$, this approximation is very poor and often too small. For example, when $p = \sqrt{n}$, the LHS is $2 \sqrt{n} / \log n$, while the RHS is $\sqrt{n}{\log n}$.
Informally, this is likely one reason why Qiaochu considers instead the approximation $$ \pi_2(n) \sim \sum_{p \leq \sqrt{n}} \pi(n/p) \tag{2}$$ instead of the arguably more natural doublecounting as done in the OP. This avoids the primes closest to $n$, where the approximation $(1)$ is worst. But I think it is nontrivial to show that using the approximation $(1)$ doesn't affect the leading constant.