Select five vectors that upon undergoing elementwise multiplication are most similar to another vector

Question

Select five vectors that upon undergoing elementwise multiplication are most similar to another vector

148 Views Asked by Bumbble Comm At 27 Mar 2026 - 1:39

I have a sparse $60000\times10000$ matrix where each element is either a $1$ or $0$ as follows.

$$M=\begin{bmatrix}1 & 0 & 1 & \cdots & 1 \\1 & 1 & 0 & \cdots & 1 \\0 & 1 & 0 & \cdots & 1 \\\vdots & \vdots & \vdots & \ddots & \vdots \\0 & 0 & 0 & \cdots & 0 \end{bmatrix}$$

Each row represents a different point in time such that the first row is $t=1$ and the last row is $t=60000$. Each column in the matrix is a different combination of signals (ie. $1$s and $0$s). I want to choose five column vectors from $M$, $c_1, c_2, c_3, c_4, c_5$ and take the Hadamard (ie. element-wise) product:

$$s = c_1 \circ c_2 \circ c_3 \circ c_4 \circ c_5$$

The result is a new vector $s$ where $1$s only remain in a row if they were present in every vector $c_1, c_2, c_3, c_4, c_5$ for that row. I need an algorithm that finds the optimal strategy vector $s$ that is most similar to the optimal strategy vector $s^{*}$. The optimal strategy vector $s^{*}$ is static and is of the same dimensions as $s$.

The easiest solution is to test every combination possible; however, choosing $5$ columns from $10000$ results in $10^{20}$ different possible combinations. Note that the five vectors chosen from $M$ must not be distinct. As such, I want to find an algorithm that returns the five vectors $c_{1}$ through $c_{5}$ that yield the optimal strategy $s^{*}$.

I am not too sure on how to approach this problem but I believe it is a discrete optimisation problem where I want to maximise the similarity, $\operatorname{sim}(s,s^{*})$. I am defining the similarity using the Jaccard Index below as it rewards having $1$s in the correct place but does not penalize $0$s. This is good because I'd rather have a false negative than a false positive. The metric is bounded between $0$ and $1$ where $1$ denotes the maximum similarity. It is commonly used for 'binary vectors' and so seems ideal here as each element is either a $1$ or $0$.

$$\operatorname{sim}(s,s^{*})=\frac{s \cdot s^{*}}{\sum_{i=1}^{60000}\left (s_{i} + s_{i}^{*} \right ) - s \cdot s^{*}}$$

Note that the operation ($\cdot$) refers to the dot product and not the Hadamard product.

I posted a similar question before but it was removed for lack of context. I have added more context now but let me know if anything is unclear. I am asking this question to aid me in a personal project I am working on.

FYI, the algorithm will be run on a computer. Any help is much appreciated -- thanks!

Original Q&A

There are 2 best solutions below

**Bumbble Comm** · Answer 1 · 2021-08-03 09:30:55

Let $m:=100000, n:=60000$.

We'll show that just deciding whether or not there exists an $s_\text{OPT}$ with $s_\text{OPT}=s^*$ is NP-hard.

Let $s_\text{OPT} :=c_1 \circ c_2 \circ c_3 \circ c_4 \circ c_5$ denote the optimal solution, and $c_1,...,c_5$ a combination that produces this optimal solution.

If for any $i$ we have $(c_i)_x=0$, then $(s_\text{OPT})_x=0$. Therefore, if $s_\text{OPT}=s_*$, whenever we have $s^*_x=1$, then for all $i$ it must hold that $(c_i)_x=1$.

Let $M_{\downarrow i}$ denote the $i$-th column of $M$.

So if we look for each column $M_{\downarrow i}$ at the index set $S_i\subseteq \{1,..,n\}$, which tells us which cells of $M_{\downarrow i}$ are $0$, then the hadamard product of columns becomes the union of their index sets.

Formally:
For each column $M_{\downarrow i}$ define $S_i$ via the equivalence $\forall x \in \{1,...,n\}:\quad x\in S_i\Leftrightarrow (M_{\downarrow i})_x=0$.

Then the following holds: $$ (M_i\circ M_j )_x=0 \Leftrightarrow x\in (S_i \cup S_j) $$

Using this mapping of our row vectors to sets, we can now reduce this problem to a known one (technically I'd have to show the reduction the other way around, but the reduction in the other direction works similar):

Let $S_*$ be the set corresponding to $s^*$.

If we assume that $s_\text{OPT}=s_*$, then we only have to look at columns that have at least a one everywhere $s^*$ has a one, so we discard all sets $S_i$ where $S_i\supsetneq S_*$.

We now have an instance of Maximum k-Set Cover, which is a special case of the Set Cover Problem in which we want to find the biggest union of at most $k$ sets.

This problem however is in $W[2]$ when parameterized after $k$, so there can't be any efficient algorithm, see this link.

If you want an exact algorithm, your best chance is to look for an FPT that is only exponential in the size $\|s_\text{OPT}-s^*\|_1$.

Otherwise you'll probably have to live with a heuristic. While not exactly your problem, I think the ideas of heuristics for the Maximum Coverage Problem should be somewhat transferable.

**Bumbble Comm** · Answer 2 · 2021-08-03 19:23:54

You can formulate this as a binary linear program and exploit sparsity if you are willing to settle for a somewhat different objective function (your chosen similarity measure being nonlinear).

Let $O_i = \lbrace j : M_{ij} = 1\rbrace$ be the set of column indices where row $i$ has a one. Let $x_j\in \lbrace 0, 1 \rbrace$ be 1 if column $j$ is selected and 0 if not, and let $s_i \in [0,1]$ for each row index $i$. (You can make the $s_i$ binary or not. It's hard to say which way will work better with respect to solver time.) Now add the constraints\begin{align*} \sum_{j}x_{j} & =5\\ \sum_{j\in O_{i}}x_{j} & \le4+s_{i}\quad\forall i\\ \sum_{j\in O_{i}}x_{j} & \ge5s_{i}\quad\forall i. \end{align*} The first says that you have to select exactly five columns. The second says that if $s_i$ is to be 0, you cannot select more than four columns with value 1 in row $i$. The third says that if $s_i$ is to be 1, you must choose five columns that are all 1 in row $i$. Note that the number of constraints is approximately twice the number of rows, and the only dense constraint is the first one.

That brings us to the objective function. If you want to maximize the number of ones in $s^*$ that get matched, you can just maximize $$\sum_{j:s^*_j=1}s_j.$$ If you want to minimize the number of times $s^*$ is 0 but $s$ is 1 (false positive?), you can minimize $$\sum_{j:s^*_j=0} s_j.$$ To minimize the $\parallel s-s^*\parallel_1$, just minimize $$\sum_{j:s^*_j=0}s_j + \sum_{j:s^*_j=1}(1-s_j).$$

Select five vectors that upon undergoing elementwise multiplication are most similar to another vector

There are 2 best solutions below

Related Questions in COMBINATORICS

Related Questions in ALGORITHMS

Related Questions in NUMERICAL-OPTIMIZATION

Related Questions in DISCRETE-OPTIMIZATION

Related Questions in MIXED-INTEGER-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions