Required length of $\{0,1\}$-sequences to have unique hamming distances.

149 Views Asked by At

Let $f(n)$ be minimal such that there exist $n$ $\{0,1\}$-sequences of length $f(n)$ such that any pair has a unique hamming distance. What is the asymptotic behaviour of $f(n)$?

I have calculated $f(1)=0$, $f(2)=0$, $f(3)=3$, $f(4)=6$, $f(5)=11$.

For $n\geq3$ all sequences need to be distinct and we need $\binom{n}2$ different hamming distances, so we get $f(n)\geq\binom{n}2$.

I found the following construction giving $f(n)<\frac13n^3$ for $n\geq3$.

Consider finding as many $\{0,1\}$-sequences of length $k$ as possible such that each pair has a unique symmetric difference. Say that we already have $\ell$ such sequences. We can then add some sequence $x$ if and only if we do not already have sequences $x_1$, $x_2$, $x_3$ such that $x_1\oplus x_2=x_3\oplus x$. This is equivalent to $x_1\oplus x_2\oplus x_3=x$. There are at most $\ell+\binom\ell3$ different sequences of the form $x_1\oplus x_2\oplus x_3$, so as long as $\ell+\binom\ell3<2^k$ there exists some sequence that we can add. Therefore, if $n-1+\binom{n-1}3<2^k$ then there exists a set of $n$ $\{0,1\}$-sequences of length $k$ such that each pair has a unique symmetric difference. Note that $n-1+\binom{n-1}3<\frac16n^3$ for all integers $n$, so $\frac16n^3\leq2^k$ suffices.

Let $k:=\lceil\log_2(n^3/6)\rceil$. Then $\frac16n^3\leq2^k<\frac13n^3$, so there exist $n$ $\{0,1\}$-sequences $x_1$, ..., $x_n$ of length $k$ such that each pair has a unique symmetric difference. For each sequence $x_i=(b_0,...,b_{k-1})$ we let $y_i$ be the sequence starting with $1$ times $b_0$, then $2$ times $b_1$, then $4$ times $b_2$, up until $2^{k-1}$ times $b_{k-1}$. Then it follows that each pair of such sequences has a unique hamming distance. Since we have $n$ sequences of length $2^k-1$, we get $f(n)\leq2^k-1<\frac13n^3$.

This is a very open ended question, so let me be precise about when I will accept an answer. Whenever a new upper or lower bound is proven such that the ratio with the old bound tends to infinity, I consider this an improvement of our understanding. The first answer to improve our understanding will be accepted. Any further improvements to our understanding I plan to give bounties.

EDIT: So we now know that $\limsup f(n)/n^2\leq1$ and $\liminf f(n)/n^2\geq\frac12$. If either of these values can be determined, I will give a bounty.

4

There are 4 best solutions below

2
On BEST ANSWER

Here's an upper bound of $n^2-n+1$ for certain $n$. One can get an upper bound of $(1+o(1))n^2$ from this bound, since it holds for $n$ one more than a prime.

Let $n$ be such that $n-1$ is a prime power, and let $k = n^2-n+1$. Then there is a perfect difference set $C \subseteq \mathbb{Z}_k$ of size $|C| = n$. Let $E = \{1,\dots,\frac{k-1}{2}\}$. Let $A_1,\dots,A_n = \{c+E : c \in C\}$ be subsets of $\mathbb{Z}_k$. If we let $x_1,\dots,x_n \in \{0,1\}^k$ be $A_1,\dots,A_n$ viewed as subsets of $[k]$, then since the $A_i$'s have the same size, to show that pairs of $x_i$'s have different Hamming distances, it suffices to show that $A_i\cap A_j$ for $i < j$ have different sizes. But just note $|(c+E)\cap (d+E)| = |(c-d+E)\cap E|$ is $\frac{k-1}{2}-(c-d)$ for $c-d \in \{1,\dots,\frac{k-1}{2}\}$.

0
On

Not an answer.

Here's a proof that $n^3$ is tight (asymptotically) if the support of the sequences are disjoint.

The question is essentially how small can $a_1+\dots+a_n$ be if $a_1,\dots,a_n$ are positive integers with each pair having a distinct sum.

For the upper bound, let $\{a_1 < \dots < a_n\}$ be a Sidon subset of $[(1+o(1))n^2]$, and note $a_1+\dots+a_n \le (1+o(1))n^3$.

For the lower bound, if we have such $a_1,\dots,a_n$, then $S := \#\{a_j : a_j \le \frac{n^2}{10}\}$ is a subset of $[\frac{n^2}{10}]$ with pair sums distinct, so must have size at most $(2+o(1))\frac{n}{\sqrt{10}}$, so $a_1+\dots+a_n \ge (n-|S|)\frac{n^2}{10} \ge \left(n-(2+o(1))\frac{n}{\sqrt{10}}\right)\frac{n^2}{10} = \Omega(n^3)$.

0
On

There are infinitely many $n$ for which $f(n) \ge {n \choose 2}+\frac{\sqrt{n}}{2}-1$.

For almost every $n$ (in the sense of asymptotic density), it holds that $f(n) \ge {n \choose 2}+o(\sqrt{n})$.

If $f(n) = {n \choose 2}$, then $\sqrt{n} \in \mathbb{Z}$ or $\sqrt{n-2} \in \mathbb{Z}$.

Suppose pairs of $x_1,\dots,x_n \in \{0,1\}^{{n \choose 2}+d}$ have distinct Hamming distances. WLOG, say $x_1,\dots,x_r$ have even sum of digits and $x_{r+1},\dots,x_n$ have odd sum of digits. Then the pairs $(x_i,x_j)$ for $i \le r$ and $j \ge r+1$ contribute exactly the odd Hamming distances, so we have $r(n-r) = \frac{1}{2}{n \choose 2}+\Delta$ for some $\Delta \in \frac{1}{2}\mathbb{Z}, |\Delta| \le \frac{d+1}{2}$. This gives $r = \frac{n\pm \sqrt{n-4\Delta}}{2}$, which implies $n$ is within $2d+2$ of a perfect square. The three claims follow (for the third, need to be more careful and note $\Delta \in \{0,\frac{1}{2}\}$).

1
On

We prove $f(n) \ge (1-o(1))n^2$. Motivation is page $\approx 2.4$ to $\approx 3.3$ of this.

Easy Fact 1: The number of things in $\{0,1\}^{f(n)}$ within Hamming distance $m$ of two given things with Hamming distance $r \le m$ is $\sum_{a=0}^r\sum_{b=0}^{\min(m-a,m+a-r)}{r \choose a}{f(n)-r \choose b}$.

Let $A = \{x_1,\dots,x_n\} \in \{0,1\}^{f(n)}$ have pairs with distinct Hamming distances. For each $v \in \{0,1\}^{f(n)}$, let $A_v = A \cap B_m(v)$, where $B_m(v) := \{u \in \{0,1\}^{f(n)} : d_{\text{Ham}}(u,v) \le m\}$, where $m$ will be chosen later. Then $\sum_v |A_v| = n\sum_{j=0}^m {f(n) \choose j}$, and so $\sum_v {|A_v| \choose 2} = \frac{1}{2}\sum_v |A_v|^2-\frac{1}{2}n\sum_{j=0}^m {f(n) \choose j} \ge \frac{1}{2}2^{f(n)}\left(\frac{n\sum_{j=0}^m {f(n) \choose j}}{2^{f(n)}}\right)^2-\frac{1}{2}n\sum_{j=0}^m {f(n) \choose j}$. On the other hand, $\sum_v {|A_v| \choose 2} = \sum_v \sum_{i < j} 1_{x_i,x_j \in A_v} = \sum_{i < j} \sum_v 1_{A_v \ni x_i,x_j} = \sum_{r=0}^m \sum_{i < j} 1_{d_{\text{Ham}}(x_i,x_j) = r} \sum_v 1_{A_v \ni x_i,x_j} = \sum_{r=0}^m \sum_{i < j} 1_{d_{\text{Ham}}(x_i,x_j)=r} \sum_{a=0}^r \sum_{b=0}^{\min(m-a,m+a-r)} {r \choose a}{f(n)-r \choose b} \le \sum_{r=0}^m \sum_{a=0}^r\sum_{b=0}^{\min(m-a,m+a-r)}{r \choose a}{f(n)-r \choose b}$, where the last equality used distinct Hamming distances. For any $\epsilon > 0$, if we choose $m = \frac{f(n)}{2}+C\sqrt{f(n)}$ for a large $C$ (depending only on $\epsilon$), then by comparing the lower bound and upper bound we have obtained for $\sum_v {|A_v| \choose 2}$, we end up with $f(n) \ge (1-\epsilon-o(1))n^2$. Modulo those computations, which are below, this is the end of the proof. $\square$

$\text{}$

$\text{}$

$\text{}$

$\text{}$

$\text{}$

Easy Fact 2: For $\Delta = O(\sqrt{n})$, ${n \choose \frac{n}{2}-\Delta} \sim e^{-\frac{2\Delta^2}{n}}{n \choose \frac{n}{2}}$. Also, ${n \choose n/2} \sim \sqrt{\frac{2}{\pi}}\frac{2^n}{\sqrt{n}}$.

Note $\sum_{r=0}^m \sum_{a=0}^r\sum_{b=0}^{\min(m-a,m+a-r)}{r \choose a}{f(n)-r \choose b} = 2\sum_{r=0}^m\sum_{a=0}^{r/2}\sum_{b=0}^{m+a-r} {r \choose a}{f(n)-r \choose b}$ (up to one term maybe, but we'll see that it'll be negligble).

Okay, now let's take $m = \frac{f(n)}{2}-C\sqrt{f(n)}$ for some fixed $C \in \mathbb{R}$. Note $\frac{1}{2^{f(n)}}\sum_{j=0}^{\frac{f(n)}{2}-C\sqrt{f(n)}} {f(n) \choose j} = \frac{1}{2^{f(n)}}\sum_{\Delta = C\sqrt{f(N)}}^{f(n)/2} {f(n) \choose \frac{f(n)}{2}-\Delta} \approx \frac{1}{2^{f(n)}}\sum_{\Delta = C\sqrt{f(N)}}^{f(n)/2} e^{-2\Delta^2/f(n)}{f(n) \choose \frac{f(n)}{2}} \approx \frac{{f(n) \choose f(n)/2}}{2^{f(n)}}\int_{C\sqrt{f(N)}}^{f(n)/2} e^{-2\Delta^2/f(n)}d\Delta = \frac{{f(n) \choose f(n)/2}}{2^{f(n)}}\int_C^\infty e^{-2x^2}\sqrt{f(n)}dx \approx \sqrt{\frac{2}{\pi}}\int_C^\infty e^{-2x^2}dx$. Therefore, the main term of the lower bound for $\sum_v {|A_v| \choose 2}$ is $\frac{1}{\pi}(\int_C^\infty e^{-2x^2}dx)^2n^22^{f(n)}$. The upper bound is $2\sum_{r=0}^{\frac{1}{2}f(n)-C\sqrt{f(n)}}\sum_{a=0}^{r/2}\sum_{b=0}^{\frac{f(n)}{2}-C\sqrt{f(n)}+a-r}{r \choose a}{f(n)-r \choose b}$. Writing $a = \frac{r}{2}-\Delta$ and $b = \frac{f(n)-r}{2}-x$, we get $2\sum_{r=0}^{\frac{1}{2}f(n)-C\sqrt{f(n)}}\sum_{\Delta=0}^{r/2}\sum_{x=C\sqrt{f(n)}+\Delta}^{\frac{f(n)-r}{2}} {r \choose \frac{r}{2}-\Delta}{f(n)-r \choose \frac{f(n)-r}{2}-x} \approx 2\sum_{r=0}^{\frac{1}{2}f(n)-C\sqrt{f(n)}}\sum_{\Delta=0}^{r/2}\sum_{x=C\sqrt{f(n)}+\Delta}^{\frac{f(n)-r}{2}}e^{-\frac{2\Delta^2}{r}}{r \choose \frac{r}{2}}e^{-\frac{2x^2}{f(n)-r}}{f(n)-r \choose \frac{f(n)-r}{2}} \approx 2\sum_{r=0}^{\frac{1}{2}f(n)-C\sqrt{f(n)}}{r \choose \frac{r}{2}}{f(n)-r \choose \frac{f(n)-r}{2}}\int_0^{r/2}\int_{C\sqrt{f(n)}+\Delta}^{\frac{f(n)-r}{2}} e^{-\frac{2\Delta^2}{r}}e^{-\frac{2x^2}{f(n)-r}}dxd\Delta = 2\sum_{r=0}^{\frac{1}{2}f(n)-C\sqrt{f(n)}}\sqrt{r}{r \choose \frac{r}{2}}\sqrt{f(n)-r}{f(n)-r \choose \frac{f(n)-r}{2}}\int_0^{\sqrt{r}}\int_{\frac{C\sqrt{f(n)}+\sqrt{r}t}{\sqrt{f(n)-r}}}^{\frac{1}{2}\sqrt{f(n)-r}} e^{-2t^2}e^{-2y^2}dydt \approx \frac{4}{\pi}2^{f(n)}\sum_{r=0}^{\frac{1}{2}f(n)-C\sqrt{f(n)}} \int_0^{\sqrt{r}}\int_{\frac{C\sqrt{f(n)}+\sqrt{r}t}{\sqrt{f(n)-r}}}^{\frac{1}{2}\sqrt{f(n)-r}} e^{-2t^2}e^{-2y^2}dydt \approx \frac{4}{\pi}2^{f(n)}\int_0^{\frac{1}{2}f(n)-C\sqrt{f(n)}}\int_0^{\sqrt{r}}\int_{\frac{C\sqrt{f(n)}+\sqrt{r}t}{\sqrt{f(n)-r}}}^{\frac{1}{2}\sqrt{f(n)-r}} e^{-2t^2}e^{-2y^2}dydtdr = \frac{4}{\pi}f(n)2^{f(n)}\int_0^{\frac{1}{2}-\frac{C}{\sqrt{f(n)}}}\int_0^{\sqrt{\lambda f(n)}}\int_{\frac{C+t\sqrt{\lambda}}{\sqrt{1-\lambda}}}^{\frac{1}{2}\sqrt{(1-\lambda)f(n)}} e^{-2t^2}e^{-2y^2}dydtd\lambda \approx \frac{4}{\pi}f(n)2^{f(n)}\int_0^{\frac{1}{2}}\int_0^\infty\int_{\frac{C+t\sqrt{\lambda}}{\sqrt{1-\lambda}}}^\infty e^{-2t^2}e^{-2y^2}dydtd\lambda.$

Therefore, up to lower order terms, we have $$f(n) \ge \frac{(\int_C^\infty e^{-2x^2}dx)^2}{4\int_0^{\frac{1}{2}}\int_0^\infty\int_{\frac{C+t\sqrt{\lambda}}{\sqrt{1-\lambda}}}^\infty e^{-2t^2}e^{-2y^2}dydtd\lambda}n^2.$$

By the dominated convergence theorem, we have $$\lim_{C \to -\infty} \frac{(\int_C^\infty e^{-2x^2}dx)^2}{4\int_0^{\frac{1}{2}}\int_0^\infty\int_{\frac{C+t\sqrt{\lambda}}{\sqrt{1-\lambda}}}^\infty e^{-2t^2}e^{-2y^2}dydtd\lambda} = \frac{(\int_{-\infty}^\infty e^{-2x^2}dx)^2}{4\int_0^{\frac{1}{2}}\int_0^\infty\int_{-\infty}^\infty e^{-2t^2}e^{-2y^2}dydtd\lambda} = 1.$$ Therefore, for any $\epsilon > 0$, if we take $C$ negatively large enough, we have $f(n) \ge (1-\epsilon+o(1))n^2$. $\square$

$\text{}$

Remark: Taking $m = f(n)$ will yield the trivial bound $f(n) \ge \frac{1}{2}n^2$. What we did was take $m = \frac{f(n)}{2}+C\sqrt{f(n)}$ for a very large fixed $C$. I really would like to better understand why this all worked. I'm honestly shocked it did.