Repeatedly taking mean values of non-empty subsets of a set: $2,\,3,\,5,\,15,\,875,\,...$

727 Views Asked by At

Consider the following iterative process. We start with a 2-element set $S_0=\{0,1\}$. At $n^{\text{th}}$ step $(n\ge1)$ we take all non-empty subsets of $S_{n-1}$, then for each subset compute the arithmetic mean of its elements, and collect the results to a new set $S_n$. Let $a_n$ be the size of $S_n$. Note that, because some subsets of $S_{n-1}$ may have identical mean values, $a_n$ may be less than the number of non-empty subsets of $S_{n-1}$ (that is, $2^{a_{n-1}}-1$).

For example, at the $1^{\text{st}}$ step we get the subsets $\{\{0\},\,\{1\},\,\{0,1\}\}$, and their means are $\{0,\,1,\,1/2\}.$ So $S_1=\{0,\,1,\,1/2\}$ and $a_1=|S_1|=3.$

At the $2^{\text{nd}}$ step we get the subsets $\{\{0\},\,\{1\},\,\{1/2\},\,\{0,\,1\},\,\{0,\,1/2\},\,\{1,\,1/2\},\,\{0,\,1,\,1/2\}\},$ and their means are $\{0,\,1,\,1/2,\,1/2,\,1/4,\,3/4,\,1/2\}.$ So, after removing duplicate values, we get $S_2=\{0,\,1,\,1/2,\,1/4,\,3/4\}$ and $a_2=|S_2|=5.$ And so on.

The sequence $\{a_n\}_{n=0}^\infty$ begins: $2,\,3,\,5,\,15,\,875,\,...$

I submitted it as A273525 in OEIS.

A brute-force algorithmic approach allows to easily find its elements up to $a_4=875$, but becomes computationally unfeasible after that. My question is:

What is the value of $a_5$?

It's easy to see that $5\times10^5<a_5<2^{875}<10^{264}$ (the lower bound $5\times10^5$ is found by direct enumeration of some subsets of $S_4$ on computer). Greg Martin in his answer below proves stricter bounds $2\times10^6<a_5<7\times10^{12}$. Can we find the exact value of $a_5$?

1

There are 1 best solutions below

2
On

Not a complete answer, but some ideas:

The number of distinct means of $k$-element subsets of an $n$-element set is at least $k(n-k)+1$. For example, when $k=3$ and $n=9$, the following subsets all have different means: $\{x_1,x_2,x_3\}$, $\{x_1,x_2,x_4\}$, ..., $\{x_1,x_2,x_9\}$, $\{x_1,x_3,x_9\}$, ..., $\{x_1,x_8,x_9\}$, $\{x_2,x_8,x_9\}$, ..., $\{x_3,x_8,x_9\}$. Applying this with $n=875$ and $k=438$ already gives 191,407 distinct means.

We can build on this though. Of the 875 means counted by $a_4$, 52 of them have a factor of 13 in their denominator, while the other 823 do not. Taking $k=412$ and $n=823$, we get 169,333 subsets with distinct means. But furthermore, of those 52, there are numerators corresponding to each nonzero residue class modulo 13. Therefore we can take each of the 169,333 subsets and get 13 variants of it with different means (the subset itself, together with the subset with a single element appended, that element having a denominator divisible by 13 and a numerator from each of the nonzero residue classes modulo 13). That gives 2,201,329 means that a little thought verifies are distinct.

One could experiment with denominator factors other than 13 (perhaps composite ones) to squeeze more out of this argument.

Finally, note that the mean of a $p$-element subset and the mean of a $q$-element subset, if $p$ and $q$ are relatively prime, are quite likely to be distinct from each other. (Both primes would have to be cancelled from their denominators by the sums of the elements in the subsets.) So one should be able to combine various collections of means in this way and improve the lower bound. (Of course, taking $p$ and $q$ near $875/2$ seems the best place to explore.)

(added later) As for the upper bound, let's bound the number of $k$-element means separately and mostly forget about whether they could coincide. Obviously there are 875 $1$-element means. For $2\le k\le 875$, there are obviously at most $\binom{875}k$ $k$-element means. However, we can get a different upper bound as follows: The largest of the 875 elements is $1$ of course, and the least common denominator of the 875 elements is 17,297,280. Therefore every single $k$-element mean is a rational number between $0$ and $1$ whose denominator divides $17\text{,}297\text{,}280k$, and there are at most $17\text{,}297\text{,}280k-1$ of them (not counting $0$ and $1$ themselves, which are already counted by the $1$-element means). Therefore an upper bound for $a_5$ is $$ 875 + \sum_{k=2}^{875} \min\bigg\{ \binom{875}k, 17\text{,}297\text{,}280k-1 \bigg\} = 6\text{,}568\text{,}806\text{,}008\text{,}597. $$ So at least we know that $a_5$ is between $2\times10^6$ and $7\times10^{12}$.