Let $p$ be any prime. Let $S$ be the range of natural numbers in $[1, p^2]$.
Suppose that there are no primes in $(p,p^2)$, which means that all prime factors of every number in $S$ must be $p$ or smaller. Equivalently, this means that $\mathrm{gpf}(n)\leq p$ for all $n \in S$, where $\mathrm{gpf}(x)$ is a function which returns the greatest prime factor of $x$.
This means that every number $n \in S$ may be uniquely assigned to a subset $S_k$, such that $\mathrm{gpf}(n)=k$, with prime $k \leq p$.
We start with the straightforward
$$S_p = \{p,2p,3p,\ldots,p(p-2),p(p-1),p^2\}.$$
Note that the mean of $S_p$ is simply its middle term, $\frac{1}{2}p(p+1)$.
The mean value of the elements in $S$ is $\frac{1}{2}(p^2+1)$. Comparing that to the mean value of $S_p$, we see that
$$\bar{S}=\frac{1}{2}(p^2+1)=\frac{1}{2}(p^2+p)-\frac{p}{2}+1=\bar{S_p}-\frac{p}{2}+1.$$
This means that our $S_p$ has performed just barely above the aggregate expectation of the average size of elements within a $\mathrm{gpf}$ subset.
I assert that every smaller $S_k$ will have a mean substantially smaller than $S_p$, which effectively means smaller than $S$ too.
Why? I see two reasons:
- We've already locked down the best possible number $p^2$, and $p$ on the low end; this means that for any $q<p$, its largest possible value will still be $<p^2$, and its smallest value will be $<p$, meaning that all else being equal, its mean will be accordingly lower.
- Much more importantly, $p$ had no interference; every smaller prime will. If e.g. $p=7$, then when we move down to $S_5$, we won't be able to include $5\times 7$, which destroys $S_5$'s mean value. Every smaller prime will find its larger values effectively blocked by the larger primes that came first, preferentially removing larger numbers from their means.
(Maybe worth mentioning that there is also no chance of having that effect work in your favor; if a small prime has a larger prime block it while still early on, it doesn't help. If you get blocked $c$ times before the midway point, you'll get blocked by the same prime either $c$ or $c+1$ times on the second half.)
I think that concludes the nitty gritty part. So...
Conclusion
Since we know that each integer in $S$ will be selected exactly once, it follows that the overall mean of the aggregate of all $S_k$ must equal that of $S$ itself. But if all $S_k$ mean values provide smaller-than-average averages$-$that is, $\bar{S_k}<\bar{S}$ for all $k-$then barring a mistake elsewhere, the pigeonhole principle grants us a contradiction, implying there must be a countervailing amount larger than average that's hasn't been counted. That missing mass, of course, is the contribution of the primes in $(p,p^2)$ that we've ignored.
(And yes, $\bar{S_p}>\bar{S}$, but that amount is always negligible; it's easy to show it's more than offset by $S_2$ alone.)
Toy example
Because I always find plugging in numbers helpful, and perhaps others do too. The numbers are much too small to be compelling, and this is only meant to be illustrative.
- Let $p=7$, and so $S=[1,49]$, and $S$ has a mean value of $(1+7^2)/2=25$.
- $S_7=\{7,14,21,28,35,42,49\}$. Note the mean value of $S_7$ is $28$.
- $S_5 = \{5,10,15,20,25,30,40,45\}$. (Note the missing $35$.) $S_5$ has a mean of $23.75$, which is lower than that of $S$.
- $S_3 = \{3,6,9,12,18,24,27,36,48\}$, with a mean of $20.\bar{3}$.
- $S_2 = \{2, 4, 8, 16, 32\}$, with a mean of $12.4$.
For the overall average to return to $S$'s $25$, it's clear that some very large terms are required. Therefore, primes.
I would say look at Simpson's paradox, which allows me to accept "all the $S_p<\bar{S}$ except one" without accepting "therefore I need a term $S_p\gg\bar{S}$".
A much bigger hole is that you need weighted averages to combine the $\{S_p\}$ up into $\bar{S}$, and you'll notice that the larger the $p$, the more numbers can land in $S_p$ - they're just $p$-smooth numbers after all.
So your lowest averages are inherently the smallest, least representative sets in the partition.