Best approximation of sum of unit vectors by a smaller subset

446 Views Asked by At

Let $v_1,\ldots,v_N$ be linear independent unit vectors in $\mathbb{R}^N$ and denote their scaled sum by $s_N = \frac{1}{N}\sum_{k=1}^N v_k.$ I would like to find a small subset of size $n$ among those vectors such that their scaled sum approximates $s_N$ well. In other words find

$$ J = \underset{J\in\mathscr{J}}{\operatorname{argmin}} \bigg\lVert s_N - \frac{1}{n}\sum_{k=1}^n v_{J_k}\bigg\rVert$$

where $J$ runs over the set $\mathscr{J}$ of all subsets of $\{1,\ldots,N\}$ with size $n$ and $\lVert \cdot \rVert$ is the euclidean norm.

The set of vectors can be considered an iid sample drawn uniformly from the sphere. And, of course, in my case $N$ and $n$ are too large ($N$ will be of the order of 10'000 or 100'000 and $n$ maybe one or two magnitudes smaller) to just try all subsets. So I am looking for something more clever.

My approach so far

I tried

  • Repeated random subsampling, i.e. drawing many, many subsets of size $n$ in an iid fashion, calculating the approximation for each instance and retaining the best.
  • Greedy approach, starting with a single vector, and then increasing the set in steps every time by a single vector. The vector is that single vector which gives the best approximation for the enlarged set.

Questions

  • Is this a known problem with a proper name?
  • Is it hard (as in NP-hard for example) or are clever solutions known?
  • Are there better heuristic approaches?
  • Are there theoretic results/performance guarantees for the two heuristics I used?

Note: I edited the question to include scaling. Some of the answers/comments refer to the older version where vectors were not scaled.

2

There are 2 best solutions below

12
On BEST ANSWER

As suggested by @BenGrossmann, you can use integer linear programming to minimize the 1-norm. Explicitly, let binary decision variable $x_j$ indicate whether $j \in J$. The problem is to minimize $\sum_{i=1}^N (z_i^+ + z_i^-)$ subject to linear constraints \begin{align} (s_N)_i - \frac{1}{n}\sum_{j=1}^N (v_j)_i x_j &= z_i^+ - z_i^- &&\text{for $i \in \{1,\dots,N\}$} \\ \sum_{j=1}^n x_j &= n \\ z_i^+ &\ge 0 &&\text{for $i \in \{1,\dots,N\}$}\\ z_i^- &\ge 0 &&\text{for $i \in \{1,\dots,N\}$} \end{align}

This might provide a good approximation for your 2-norm objective or a good starting solution for an improvement heuristic.


For the 2-norm, the problem is to minimize $\sum_{i=1}^N \left((s_N)_i - \frac{1}{n}\sum_{j=1}^N (v_j)_i x_j\right)^2$ subject to linear constraint $$ \sum_{j=1}^n x_j = n \tag1 $$

Because $x_i$ is binary, we have $x_i^2 = x_i$. For $i < j$, you can linearize each product $x_i x_j$ as described here.

You can also strengthen the formulation by multiplying both sides of the cardinality constraint $(1)$ by $x_i$, yielding: $$\sum_{j=1}^{i-1} x_j x_i + x_i^2 + \sum_{j=i+1}^n x_i x_j = n x_i$$ And then linearize this quadratic constraint by using the products from the objective linearization: $$\sum_{j=1}^{i-1} y_{j,i} + \sum_{j=i+1}^n y_{i,j} = (n - 1) x_i$$


Given a feasible solution, a simple improvement heuristic for either norm is to replace one vector in $J$ with one vector not in $J$ if it improves the objective value.

7
On

Let $A$ denote the matrix whose columns are $v_1,\dots,v_N$. Then your problem is that of minimizing $\|s_N - Ax\|$ subject to the constraint that $x$ has $0,1$ entries and $\|x\| \leq \sqrt{n}$.

Removing the constraint that $x$ has $0,1$ entries leaves us with a much easier problem to deal with. I suspect that its solution will yield a useful heuristic.

If $A = U \Sigma V^T$ is an SVD and we make the substitutions $b = U^Ts_N$ and $y = V^Tx$, we are left with the simplified problem $$ \min \|\Sigma y - b\| \quad \text{s.t. } \quad \|y\| \leq \sqrt{n}. $$ This is easily solved with Lagrange multipliers. The squared objective and constraint functions have the forms $$ f(y) = \|\Sigma y - b\|^2 \implies \nabla f = 2 [\Sigma^2 y - \Sigma b] \\g(y) = \|y\|^2 \implies \nabla g = 2y $$ So, we have $$ \nabla f = \lambda \nabla g \implies \Sigma^2 y - \Sigma b = \lambda y \implies (\Sigma^2 - I)y = \lambda \Sigma b \implies y = \lambda(\Sigma^2 - I)^{-1}\Sigma b. $$ Note: this assumes that $A$ does not have $1$ as a singular value, which occurs with probability $1$. Plugging into the constraint yields $$ \|\lambda(\Sigma^2 - I)^{-1}\Sigma b\|^2 = n \implies \lambda = \pm \sqrt{\frac{n}{\|(\Sigma^2 - I)^{-1}\Sigma b\|^2}}, $$ which is simply to say that this solution for $y$ should be normalized to the radius-$\sqrt{n}$ sphere.

I'm not sure if this can be written in terms that remove the SVD. For what it's worth, though, we have $$ (\Sigma^2 - I)^{-1}\Sigma = V^T[(A^TA - I)^{-1}\sqrt{A^TA}]V. $$