Efficiently testing for more zeros than ones in a binary code block

253 Views Asked by At

Setting. I have a set $\mathcal{C}$ of binary sequences where each sequence has length $L\in \mathbb{N}$ and the total number of sequences in $\mathcal{C}$ is $N\in \mathbb{N}$. Assume that $N=\exp(O(L))$.

Assume that the only way elements of $\mathcal{C}$ can be viewed/accessed is via calls to a function $\Theta : \{0,1\}^{L}\rightarrow \{0,1\}^L$ that takes as input any binary sequence $s$ of length $L$ and returns the element in $\mathcal{C}$ closest in Hamming distance to $s$ (ties are broken arbitrarily).

Further, suppose that the number of zeros and ones in $\mathcal{C}$ (in total across all sequences) are such that #zeros=2 $\times$ #ones or #ones=2 $\times$ #zeros (i.e. there are either twice more zeros than ones or twice more ones than zeros).

Goal. I want a procedure which chooses a sequence $s_1,\dots,s_K\in \{0,1\}^L$ of inputs to the function $\Theta$ to test whether {#zeros $\geq $#ones in $\mathcal{C}$} with a given confidence level $\delta \in(0,1)$.

Question. Does there exist a procedure for which the required ''sample size'' $K$ to achieve the above goal is polynomial in $L$ (rather than exponential)?

1

There are 1 best solutions below

3
On

Each call $y = \Theta (x)$ gives you a value $y \in \mathcal{C}$.

The formula for determining sample size $n$ to estimate the proportion of successes in a dichotomous outcome variable (yes/no) in a single population:

$$n = p\bigg(1 - p\bigg)\bigg({Z \over E}\bigg)^2$$

where,

$Z$ is the value from the standard normal distribution reflecting the confidence level that will be used (e.g., $Z = 1.96$ for $95\%$),

$E$ is the desired margin of error,

$p$ is the proportion of successes in the population

So, $n$ depends on the experiment parameters $Z, E$ and the estimated proportion of successes only. If you don't know the estimate, use $p = 0.5$ to calculate $n$.

Then your procedure is to test random sample of size $n = \{ x_1, x_2, \dots, x_n \}$ and check $y_i = \Theta (x_i)$ for number of ones and zeroes.

If you use $Z = 1.96$ for $95\%$ Confidence Interval and Margin of Error $(E)$ of $5\%$, you need to sample

$$n = 0.5\bigg(1 - 0.5\bigg)\bigg({1.96 \over 0.05}\bigg)^2 = 384.16$$

So, you should sample at least 384 binary sequences from $\mathcal{C}$

Note that $\log_2 (384.16) \approx 8.59$. So, for binary sequences of length 8 bit or less, you need to do 100% sampling.

Obviously, this sample size is dependent only on the experiment parameters and will change with it.

Also, this does not change whether you are checking for zeroes or ones because the factor $p(1-p)$ is symmetric.