Transform a k-CNF formulae to conjunctions of boolean literals

1.4k Views Asked by At

The question comes from Mehryar Mohri's Foundations of Machine Learning. In Example 2.5 the book transform a $k$-CNF formula to conjunctions of boolean literals, but I can't understand the trick in the book.

Here is the definition of $k$-CNF formula:

A conjunctive normal form(CNF) formula is a conjunction of disjunctions. A $k$-CNF formula is an expression of the form $T_1\land \ldots \land T_j$ with arbitrary length $j\in\mathbb{N}$ and with each term $T_i$ being a disjunction of at most $k$ Boolean attributes

and here is the trick:

To do so, it suffices to associate to each $T_i$ a new variable. Then this can be done with the following bijection: $$ a_i(x_1)\lor\ldots\lor a_i(x_n)\to Y_{a_i(x_1),\ldots,a_i(x_n)} $$ where $a_i(x_j)$ denotes the assignment to $x_j$ in term $T_i$.

But how could this mapping be a bijection when $Y_{a_i(x_1),\ldots,a_i(x_n)}$ only has value 0 or 1? It'll be great if someone could give a detailed example showing how these things work. thx.

2

There are 2 best solutions below

1
On BEST ANSWER

The simbology is not so clear to me, but I think that, basically, we have that the conjunct $T_i$ in the $k$-CNF is a disjunction of literals : i.e. boolean vars or negated boolean vars.

I assume that the variables $x_j$ are the boolena variables; if so, they occur into $T_i$ as is or negated.

We introduce an "auxiliary" function $a_i$ associated to $T_i$ such that :

$a(x_j)$ has the value of $x_j$ or $\lnot x_j$, according to the way in which the variable occurs into $T_i$.

If so, the new boolean variable $Y_{a_i(x_1), \ldots, a_i(x_n)}$ will "works" exactly as the original disjunction.

1
On

I had a bit of trouble grasping the definition of "$k$-CNF formula", which threw me off: any conjunction of disjunctions is a $k$-CNF so long as each disjunctive clause has at most $k$ terms; nowhere in the definition do we limit the number of disjunctive clauses we can use. If our variables are $x$ and $y$ (so $n=2$) and $k=2$, then all of the following are $k$-CNF's, because each disjunctive clause in parentheses has at most $k=2$ terms:

$ (x \lor y) \\ (x) \\ (x \lor \bar{y} ) \land (\bar{x} \lor \bar{y}) \\ (x \lor y) \land (x \lor \bar{y}) \land (\bar{x} \lor y) \land (x) \land (y) \land (\bar{y}) $

It's useful to note that the "Conjunction of Boolean literals" problem is just the $k$-CNF problem in the case $k=1$.

We want to come up with a $k$-CNF that is consistent with some set of given samples. For each possible disjunctive clause (there are $O(n^k)$ of them) we associate a binary variable, where the variable is 1 if the clause is included in our consistent $k$-CNF and 0 if it is not. As in the "Conjunction of Boolean literals" case, we initialize each variable to $1$ so that in the $n=2, k=2$ case our initial consistent $k$-CNF is

$ (x \lor y) \land (x \lor \bar{y}) \land (\bar{x} \lor y) \land (\bar{x} \lor \bar{y}) \land (x) \land (\bar{x}) \land (y) \land (\bar{y}) $

This CNF evaluates to False, but it's still consistent if the sample set is empty. We use the same rule as in the "Conjunction of Boolean literals" case: we consider each positive sample in turn, and we rule out all clauses that evaluate to False when given that sample as input. For example, if we observe that the $k$-CNF we're learning evaluates to True for the input $x=1, y=0$, then we calculate

$ (x \lor y) \to (1 \lor 0) = 1\\ (x \lor \bar{y}) \to (1 \lor 1) = 1\\ (\bar{x} \lor y) \to (0 \lor 0) = 0\\ (\bar{x} \lor \bar{y}) \to (0 \lor 1) = 1\\ (x) \to (1) = 1\\ (\bar{x}) \to (0) = 0\\ (y) \to (0) = 0\\ (\bar{y}) \to (1) = 1 $

Since $(\bar{x} \lor y)$, $(\bar{x})$, and $(y)$ evaluate to False, we take them out of our old consistent $k$-CNF so that our new consistent $k$-CNF is

$ (x \lor y) \land (x \lor \bar{y}) \land (\bar{x} \lor\bar{y}) \land (x) \land (\bar{y}) $

Each possible disjunctive clause can be included or excluded (2 choices) and there's $O(n^k)$ disjunctive clauses, so the total number of $k$-CNF's is $2^{O(n^k)}$. Using the notation of the book, the number of possible concepts then is $|C| = 2^{O(n^k)}$. The number of possible hypotheses is at least that many: $|H| \geq |C|$. The book says our sample complexity $m$ grows like $\log |H| = O(n^k) \log 2$, which is polynomial in $n$ but exponential in $k$. When we convert a $k$-term DNF to a $k$-CNF, the $n$ and the $k$ switch, so that learning a $k$-term DNF by using this $k$-CNF learning algorithm and converting has a sample complexity that grows exponentially with $n$.