binary vector generation (by hamming distance) limit?

242 Views Asked by At

I have the following problem :

I generate n-bits long binaries (0|1), where every bit is 0 OR 1 with probability 50%. So, say I generate 'm' such binaries.

The question is what is the probability when generating the next one that it will be still ~50% away (by hamming distance) from all the generated vectors i.e. almost orthogonal.

OR

said differently, how many binaries I can generate before I generate one that is closer than ~50% away.

Normally if 'n' is big it is almost guaranteed the binary vector is orthogonal to all previously generated. I want to find the threshold 'm' of number of vectors I can safely generate for defined 'n'.

I have hard time coming up with formulation on solution to the problem. So even if you don't have a solution, but just idea on how to mathematically formulate the problem, please comment out.


I was thinking along the lines of single bit logical operations (match i.e. XOR) extended somehow over multiple n-bits. Then probabilistically treating m-count of vectors in pairs... via at-least one match ..

1

There are 1 best solutions below

2
On BEST ANSWER

If you have $N$ uniform random $k$-bit numbers, between $0$ and $2^k-1$, inclusive, the probability of at least two of them being the same is $$P(n, k) = 1 - \frac{n! {2^k \choose n}}{2^{k n}} = 1 - \frac{(2^k)!}{(2^k - n)! 2^{k n}} \tag{1}\label{NA1}$$ because this is essentially the well-known birthday problem but with $2^k$ "days". You are looking for the smallest $n$ where $P(n, k) \ge 0.5$.

Some tabulated values of smallest $n$ that fulfills $P(n, k) \ge 0.5$: $$\begin{array}{l|r} k & n \\ \hline 4 & 5 \\ 8 & 20 \\ 12 & 76 \\ 16 & 302 \\ 20 & 1206 \\ 24 & 4823 \\ 28 & 19291 \\ 32 & 77164 \\ \end{array}$$ To calculate $P(n, k)$ for larger values of $k$, you'll need to approximate $\eqref{NA1}$ somehow, perhaps starting with $P(n, k) = 0.5$ or $2 P(n, k) - 1 = 0$.


It turns out there is a very simple approximation: $$P(n, k) = 1 - e^\frac{-n(n-1)}{2^{k+1}}$$ I encountered it here.

At the limit $P(n, k) = 0.5$, with large enough $k$, say $k \ge 16$, we can approximate $$n \approx 0.833 \times 2^\frac{k+1}{2}$$