I have a sparse $60000\times10000$ matrix where each element is either a $1$ or $0$ as follows.
$$M=\begin{bmatrix}1 & 0 & 1 & \cdots & 1 \\1 & 1 & 0 & \cdots & 1 \\0 & 1 & 0 & \cdots & 1 \\\vdots & \vdots & \vdots & \ddots & \vdots \\0 & 0 & 0 & \cdots & 0 \end{bmatrix}$$
Each row represents a different point in time such that the first row is $t=1$ and the last row is $t=60000$. Each column in the matrix is a different combination of signals (ie. $1$s and $0$s). I want to choose five column vectors from $M$, $c_1, c_2, c_3, c_4, c_5$ and take the Hadamard (ie. element-wise) product:
$$s = c_1 \circ c_2 \circ c_3 \circ c_4 \circ c_5$$
The result is a new vector $s$ where $1$s only remain in a row if they were present in every vector $c_1, c_2, c_3, c_4, c_5$ for that row. I need an algorithm that finds the optimal strategy vector $s$ that is most similar to the optimal strategy vector $s^{*}$. The optimal strategy vector $s^{*}$ is static and is of the same dimensions as $s$.
The easiest solution is to test every combination possible; however, choosing $5$ columns from $10000$ results in $10^{20}$ different possible combinations. Note that the five vectors chosen from $M$ must not be distinct. As such, I want to find an algorithm that returns the five vectors $c_{1}$ through $c_{5}$ that yield the optimal strategy $s^{*}$.
I am not too sure on how to approach this problem but I believe it is a discrete optimisation problem where I want to maximise the similarity, $\operatorname{sim}(s,s^{*})$. I am defining the similarity using the Jaccard Index below as it rewards having $1$s in the correct place but does not penalize $0$s. This is good because I'd rather have a false negative than a false positive. The metric is bounded between $0$ and $1$ where $1$ denotes the maximum similarity. It is commonly used for 'binary vectors' and so seems ideal here as each element is either a $1$ or $0$.
$$\operatorname{sim}(s,s^{*})=\frac{s \cdot s^{*}}{\sum_{i=1}^{60000}\left (s_{i} + s_{i}^{*} \right ) - s \cdot s^{*}}$$
Note that the operation ($\cdot$) refers to the dot product and not the Hadamard product.
I posted a similar question before but it was removed for lack of context. I have added more context now but let me know if anything is unclear. I am asking this question to aid me in a personal project I am working on.
FYI, the algorithm will be run on a computer. Any help is much appreciated -- thanks!
Let $m:=100000, n:=60000$.
We'll show that just deciding whether or not there exists an $s_\text{OPT}$ with $s_\text{OPT}=s^*$ is NP-hard.
Let $s_\text{OPT} :=c_1 \circ c_2 \circ c_3 \circ c_4 \circ c_5$ denote the optimal solution, and $c_1,...,c_5$ a combination that produces this optimal solution.
If for any $i$ we have $(c_i)_x=0$, then $(s_\text{OPT})_x=0$. Therefore, if $s_\text{OPT}=s_*$, whenever we have $s^*_x=1$, then for all $i$ it must hold that $(c_i)_x=1$.
Let $M_{\downarrow i}$ denote the $i$-th column of $M$.
So if we look for each column $M_{\downarrow i}$ at the index set $S_i\subseteq \{1,..,n\}$, which tells us which cells of $M_{\downarrow i}$ are $0$, then the hadamard product of columns becomes the union of their index sets.
Formally:
For each column $M_{\downarrow i}$ define $S_i$ via the equivalence $\forall x \in \{1,...,n\}:\quad x\in S_i\Leftrightarrow (M_{\downarrow i})_x=0$.
Then the following holds: $$ (M_i\circ M_j )_x=0 \Leftrightarrow x\in (S_i \cup S_j) $$
Using this mapping of our row vectors to sets, we can now reduce this problem to a known one (technically I'd have to show the reduction the other way around, but the reduction in the other direction works similar):
Let $S_*$ be the set corresponding to $s^*$.
If we assume that $s_\text{OPT}=s_*$, then we only have to look at columns that have at least a one everywhere $s^*$ has a one, so we discard all sets $S_i$ where $S_i\supsetneq S_*$.
We now have an instance of Maximum k-Set Cover, which is a special case of the Set Cover Problem in which we want to find the biggest union of at most $k$ sets.
This problem however is in $W[2]$ when parameterized after $k$, so there can't be any efficient algorithm, see this link.
If you want an exact algorithm, your best chance is to look for an FPT that is only exponential in the size $\|s_\text{OPT}-s^*\|_1$.
Otherwise you'll probably have to live with a heuristic. While not exactly your problem, I think the ideas of heuristics for the Maximum Coverage Problem should be somewhat transferable.