Suppose we have a list of basic/'atomic' tasks $\{p_1, ..., p_n\}$ and each task has an associated cost $c_i$ for all $i \leq n$.
Morover, we have a list of projects $P_1,..., P_m$ which are collections of tasks, i.e, $P_k \subseteq \{p_1,...,p_n\}$ for each $k\leq m$. Completing each project $P_j$ gives us a reward $R_j$.
Given an input cost of $C$ and reward of $R$, the problem is: Is there a set of which atomic tasks to do so that the total cost is $\leq C$ and the total reward is $\geq R$?
Given the problem above- I want to see that it is NP-complete.
(Remark: I don't think it really matters but tasks can "double count" for multiple projects: meaning if $p_i \in P_j,P_k$ then doing one instance of $p_i$ contributes towards the completion of both $P_j$ and $P_k$ rather than having to do the same task twice for both projects).
Clearly this is NP. Checking a solution works is in fact linear.
I'm not sure how to show NP-completeness.
My first instinct was to show Knapsack could be reduced to this of course. In knapsack, if I'm given a number of items with values $v_i$ and weights $w_i$, I want to choose values to 'pack' so as to not exceed total weight of $W$ while retaining at least value $V$.
Given a knapsack problem I was thinking about converting it to our problem by making each 'item' a project with reward $= v_i$. Then I was simply thinking of making each project consisting of only one atomic task of cost $w_i$. This eliminates overlap between projects of course (so that in the knapsack problem you don't accidentally pack 'half' of one item, etc). I think this is a valid problem, but it seems too simple with the projects and the atomic tasks being the same thing and each project is completely distinct from the other with no overlap.
Is this a valid reduction? Or should I be looking at something else (ex: partitions or subset sums)
It might help to write the corresponding optimization problem as an integer linear programming problem. Let binary decision variable $x_i$ indicate whether task $p_i$ is performed, and let binary decision variable $y_j$ indicate whether project $P_j$ is completed. The problem is to maximize $\sum_{j=1}^m R_j y_j$ subject to \begin{align} \sum_{i=1}^n C_i x_i &\le C \tag1 \\ y_j &\le x_i && \text{for $j\in\{1,\dots,m\}$ and $i\in P_j$} \tag2 \end{align} If you have a way to solve this problem, then you can indeed solve an arbitrary $0$-$1$ knapsack problem by taking $m=n$ and $P_j=\{j\}$ for all $j$ because then $y_j=x_j$ is optimal and you can eliminate $(2)$ and maximize $\sum_{i=1}^n R_i x_i$ subject to $(1)$.