Clarification over what NP means

95 Views Asked by At

I'm reading an informal definition of the decision class NP with a specific example being the standard knapsack problem and a decision variant of this problem.

The example they are using is a decision variant of the standard knapsack problem. Where in addition to the variables in the standard problem we are given a variable $C$ and the problem is to output "yes" or "no" depending on if the optimal solution to the knapsack problem has value at least $C$.

Their explanation is

"Roughly speaking, the class $\mathrm{NP}$is the set of all decision problems such that for any "Yes" instance of the problem there is a short, easily verifiable proof that the answer is "yes". Additionally for each "No" instance of the problem, no such "proof" is convincing."

With regards to the specific problem they say

"What kind of "short proof" do we have in mind? Take the example of the decision variant of the knapsack problem given above. For any "Yes" instance in which there is a feasible subset of items of value at least $C$, a short proof of this fact is a list of the items in the subset. Given the knapsack instance and the list, an algorithm can quickly verify that the items in the list have total size at most $B$ (capacity of the knapsack) and total value at least $C$." Note that for any "no" instance, then no possible list of items will be convincing."

I'm slightly confused here, are they saying if an instance of the knap-sack problem had an optimal value larger than $C$ then in "theory" we should be able to find this quickly by outputting a subset which has total value less than $B$ but is larger than $C$? How would such a subset be found quickly?

1

There are 1 best solutions below

0
On

The explanation you quote is popular but rather simplified -- in particular the word "proof" is quite suspect here. It could be argued to be right from a certain point of view, but that point of view is not exactly what "proof" means in ordinary mathematics.

The trouble is that a "problem" in the sense relevant here is just some set of "instances" for which the answer is true (where an "instance" of the problem is encoded as a finite string of symbols).

However, in order to speak about proofs that such-and-such instance has the answer "yes", then you cannot keep the problem as just an arbitrary set of instances. A proof in the usual mathematical sense would need to be based on a concrete definition of what the problem is, and a proof that the answer is "yes" which works for one definition of a problem is usually not a valid proof that the answer is "yes" according to another definition of a problem -- even if the sets of instances defined by the two definitions happen to be the same!

Since being in NP is a property of the problem (as a set) and not a property of particular definitions of problems, speaking of "proofs" is already misleading there.

A better description of NP would be

A problem $K$ is in NP if and only if there exists a program $T$ with two inputs and a polynomial function $f$ such that --

  1. For every instance $x\in K$ of length $n$, there is at least one string $y$ of length at most $f(n)$ such that when $T$ is run on the input $(x,y)$ it will output "yes" within time $f(n)$.
  2. $T$ will never output "yes" when the input is $(x,y)$ and $x$ is not in $K$.

Note that this description is independent of how $K$ is defined: whether or not $K$ is in NP depends only on which strings are in it. If there is a program that satisfies the condition, even if it does so only by accident and we don't know how to prove that is does, then by definition $K$ is in NP.

If we really want to salvage the idea of "short proofs", we have to do something like declare that the program $T$ implicitly defines its own notion of "proof that $x$ is in $K$" -- where a "proof" is simply defined to mean "a $y$ that makes $T$ output yes". This concept does have the basic feature of "proofs" that we can check whether something is a proof or not; that's what $T$ is for. But it doesn't mean that this kind of "proof" would necessarily convince a human reader.

I think it is better not to think in terms of "proofs", but rather of "certificates" and call $T$ a "verifier" for certificates. The only thing one can expect to do with a certificate is to try to verify it and see if it's good or not. A problem $K$ is in NP if there is a possible certification scheme such that (1) everything in $K$ has a reasonably short certificate, (2) verifying a certificate can be done reasonably fast, and (3) certificates are trustworthy, that is, nothing outside $K$ has a certificate that verifies as good. Here "reasonably short" and "reasonably fast" means "polynomial in the length of the instance we're checking".


I'm slightly confused here, are they saying if an instance of the knap-sack problem had an optimal value larger than C then in "theory" we should be able to find this quickly by outputting a subset which has total value less than B but is larger than C? How would such a subset be found quickly?

Your confusion is because there is no requirement that such a subset can be found quickly. The requirement is only that it must be out there, somewhere, even though it may be extremely hard to find -- but once we have it, it can be described reasonably compactly (in this case, even shorter than the knapsack instance itself), and it can be checked quickly that it is indeed a subset with a total value in the right interval.