Specifying sets of binary strings by inclusion and exclusion filters

61 Views Asked by At

Consider the set $S^n$ of binary strings of length $n$, i.e. $S^n = \{0,1\}^n$, and the set $F^n$ of filters of length $n$ which have some unspecified entries $x$, i.e $F^n = \{0,1,x\}^n$. For $f \in F^n$ the number $b(f) = |\{ f_i\ |\ f_i \neq x \}|$ is $f$'s number of bits, i.e. of specified entries different from $x$. The binary strings $S^n \subset F^n$ as well as the empty filter $x^n$ are trivial filters with $b(f) = n$ and $b(f) = 0$, respectively.

Each filter $f \in F^n$ defines a set $S(f)\subseteq S^n$ of matching strings by

$$s \in S(f) \equiv (\forall i)\ s_i \neq f_i \rightarrow f_i = x.$$

We abbreviate $s \in f$ for $s \in S(f)$.

Two sets of filters $F_{in}, F_{ex}$ – the inclusion and the exclusion filters – define a set $S(F_{in}, F_{ex})$ by

$$s \in S(F_{in}, F_{ex}) \equiv (\exists f \in F_{in})\ s \in f \wedge (\forall f\in F_{ex})\ s \not\in f.$$

Each set $X \subset S^n$ has trivial inclusion and exclusion filters $F_{in} = X$ and $F_{ex} = \{\}$, but these are not what I am interested in. I am interested in "minimal" filters with an overall minimal number of bits. In the trivial case, there are

$$\sum_{f \in F_{in}}\ b(f) + \sum_{f \in F_{ex}}\ b(f) = n\cdot |X|$$

bits involved, but for example $X = \{1111,1011,0111\}$ is defined by $F_{in}=\{xx11\}$, $F_{ex}=\{00xx\}$ with four instead of twelve bits.

My question is for an efficient algorithm to find all minimal pairs of inclusion and exclusion filters of any given set $X \subset S^n$. I am satisfied with the small case of $n=8$.

Given this algorithm for each set $X \subset S^n$ some sort of complexity can be calculated: the minimal number of bits that is needed to specify it by inclusion and exclusion filters.

What I already know is how to reduce a (somehow) given set of exclusion filters: When there are two filters in $F_{ex}$ like $11xx$ and $10xx$, one can drop and replace them by $1xxx$. When there is a filter in $F_{ex}$ that doesn't match with any of the inclusion filters in $F_{in}$ than it can be dropped altogether: $F_{in} = \{11xx\}$ and $F_{ex} = \{00xx\}$ is equivalent with $F_{in} = \{11xx\}$ and $F_{ex} = \{\}$.

Sometimes this works:

  1. Calculate the phi coefficients $\varphi(f,X)$ of $S(f)$ and $X$ for all filters.

  2. Find maximal thresholds $\varphi_{in}>0$ and $\varphi_{ex}<0$ such that the phi coefficient of $S(F_{in},F_{ex})$ and $X$ equals $1$ for $F_{in} = \{f\ |\ \varphi(f,X) > \varphi_{in}\}$ and $F_{ex} = \{f\ |\ \varphi(f,X) < \varphi_{ex}\}$.

  3. Reduce $F_{ex}$ as described above.

If the maximal phi coefficient of $S(F_{in},F_{ex})$ and $X$ is less than $1$, further action is required.