The size of sets of positive integers not having distinct subsets with equal size and sum

598 Views Asked by At

Let us call a set $S$ of positive integers "good" if there does not exist a pair of distinct subsets $A,B\subseteq S$ who have equal size and an equal sum. Equivalently, a set $S$ is good if the function $f:P(S)\rightarrow \mathbb N^2$ defined as $$f(A)=\left(|A|,\sum_{a\in A}a\right)$$ is injective.

Given any $d$, what is the smallest $n$ such that there exists a subset $S$ of $\{1,\ldots,n\}$ which is good and has $d$ elements?


I thought of this problem as a variation on a related one where we drop the requirement that $A$ and $B$ have the same size (i.e. ask that $g(A)=\sum_{a\in A}a$ be injective on the power set of $S$). (This post previously claimed this variant was easy; it has been pointed out in the comments by @Shagnik that it is open).

A potentially useful reformulation of the new problem is that one may also define this as looking for subsets of $\{(1,1),(1,2),(1,3),\ldots,(1,n)\}$ such that no two subsets have equal sum.

I can get lower bounds on $n$. In particular, note that a $k$ element subset of $\{1,\ldots,n\}$ must have its sum lie between $1+2+\ldots+k$ and $(n-k+1)+\ldots+n$. In particular, it can take on only $k(n-k)+1$ possible sums. However, there are ${d \choose k}$ subsets of $S$ needing distinct sums. Thus, we get the bound: $${d\choose k}\leq k(n-k)+1.$$ which must hold for all $0\leq k \leq d$. If one sets $k=d/2$ and works through using Stirling's approximation, dropping some lower order terms that crop up, one gets that $n$ should be at least asymptotic to $\frac{2\sqrt{2}\cdot 2^d}{\sqrt{\pi}\cdot d^{3/2}}$. That is, the ratio of $n$ to that expression tends to at least $1$ as $d$ goes to $\infty$. For upper bounds, we can obviously bound that $n\leq 2^{d-1}$ since $\{1,2,\ldots,2^{d-1}\}$ is a good set. However, this still leaves a sizeable gap between upper and lower bounds.

I do not know whether it is optimal, but a greedy algorithm where we define a sequence $a_n$ by setting $$a_n=\min\{x:\{a_1,a_2,\ldots,a_{n-1},x\}\text{ is good}\}$$ yields a sequence starting as $1,\,2,\,3,\,5,\,8,\,14,\,25,\,45,\,85,\,162,\,310,\,595,\,1165,\,2285,\,4485,\,8808$.

I suspect this is a difficult problem to answer in full, but I would be interested in any refinements of either upper or lower bounds or any literature discussing the problem.

1

There are 1 best solutions below

0
On BEST ANSWER

You can improve the lower bound a little bit. Suppose we have $\{ a_1, a_2, ..., a_d\} \subseteq [n]$ such that no two distinct subsets of equal size have the same sum. As you noted, this is equivalent to no two distinct subsets (of any size) of $\left\{ \begin{pmatrix} 1 \\ a_1 \end{pmatrix} , \begin{pmatrix} 1 \\ a_2 \end{pmatrix} , ..., \begin{pmatrix} 1 \\ a_d \end{pmatrix} \right\}$ having the same sum.

For $1 \le i \le d$, let $\varepsilon_i$ be independent uniform $\{0,1\}$ random variables, and consider the random variables $X$ and $Y$ given by $$ \begin{pmatrix} X \\ Y \end{pmatrix} = \sum_{i=1}^d \varepsilon_i \begin{pmatrix} 1 \\ a_i \end{pmatrix}. $$

We have $X \sim \text{Bin} \left( d,\frac12 \right)$, and so it has mean $\mu_Y = \frac{d}{2}$ and variance $\sigma_X^2 = \frac{d}{4}$. By Chebyshev's inequality, $\mathbb{P}\left( |X - \mu_X| > \sqrt{\frac{3d}{4}} \right) \le \frac{\sigma_X^2}{(3d /4)} = \frac13$.

Since the $\varepsilon_i$ are independent, the variance of $Y$ is given by $\sigma_Y^2 = \sum_{i=1}^d \text{Var}(\varepsilon_i a_i) = \frac14 \sum_{i=1}^d a_i^2 \le \frac{dn^2}{4}$. Hence, if $\mu_Y$ is the expectation of $Y$, Chebyshev gives $\mathbb{P} \left( | Y - \mu_Y | > \sqrt{\frac{3dn^2}{4}} \right) \le \frac13$. (We have $\mu_Y = \frac12 \sum_{i=1}^d a_i$, but we won't be using this fact.)

Hence with probability at least $\frac13$, we have $| X - \mu_X | \le \frac{\sqrt{3d}}{2}$ and $| Y - \mu_Y | \le \frac{n \sqrt{3d}}{2}$. That is to say, for at least $\frac13 2^d$ choices of the coefficients $\varepsilon_i$, $\begin{pmatrix} X \\ Y \end{pmatrix}$ lies within some $\left( \sqrt{3d} + 1 \right)$-by-$\left(n \sqrt{3d} + 1\right)$ grid.

Since these sums must all be distinct, it follows that $$ \frac13 2^d \le \left( \sqrt{3d} + 1 \right) \left( n \sqrt{3d} + 1 \right), $$ from which we can deduce $n \ge \left( 1 - o(1) \right) \frac{2^d}{9 d}$.

Note that for the version where you don't require that the subsets have the same size, we don't need the variable $X$ and its $\sqrt{d}$ factor, so we get a lower bound of the form $\frac{c2^d}{\sqrt{d}}$ for some constant $c > 0$.

I am not sure if restricting the size of the subsets really makes the problem that much easier, and there is no known upper bound better than $c' 2^d$ for some other constant $c' > 0$, so I suspect it might be difficult to close the gap further.