Optimal Selection of a Subset of Regression Variables

Question

Optimal Selection of a Subset of Regression Variables

34 Views Asked by Bumbble Comm At 28 Mar 2026 - 9:55

I have encountered a problem at work, which looks interesting but I am sure that there must be publications on this, although I cannot find any.

Formulation

In a large regression, with hundreds of variables, I need to choose a subset of 3-5 variables (usually, 3) which "best" capture the RHS (where the sense of "best" is perhaps open to interpretation but so far I was interpreting it to mean to produce the closest approximation to the RHS by minimizing the 2-norm of the difference).

Special Case - Orthonormal Vectors

It's clear that in the case of having orthonormal vectors on which we project the RHS, it is enough to run one regression and choose the sub-selection with the largest coefficients. (In other words, if we are projecting $(3,5,7)^T$ on the standard basis in $\mathbb{R}^3$, the top 2 vectors would be $(0,0,1)^T$ and $(0,1,0)^T$ (in that order).

However, this intuitive greedy algorithm does not produce the optimal solution when the vectors onto which we are projecting are not orthogonal (which is practice happens almost always, unfortunately). For example, if our basis is $(0,1)^T$ and $\left(1/\sqrt{2},1/\sqrt{2}\right)^T$ in $\mathbb{R}^2$ (which is normal but not orthogonal), any vector $(x,x+a)^T$ in the first quadrant (with $x,a \in \mathbb{R}^+$), can be uniquely represented as a linear combination of the elements of our basis as $$ (x,x+a)^T = \frac{x}{\sqrt{2}} (1,1)^T + a (0,1)^T, $$ which would be our regression output. So if we went with the intuition from the orthonormal case, we would choose $(0,1)^T$ iff $a\sqrt{2} \ge x$.

However, note that the projection of $(x,x+a)^T$ onto $(0,1)^T$ is $(0, x+a)^T$, which is $x$ away, and the projection of the same vector on $\left(1/\sqrt{2},1/\sqrt{2}\right)^T$ is $(x+a/2,x+a/2)$, which is $\sqrt{(a/2)^2 + (a/2)^2} = a/\sqrt{2}$ units away, so one would choose $(0,1)^T$ if $x < a/\sqrt{2}$, and the inequalities are not equivalent, so the intuitive greedy strategy does not provide an optimal solution in all cases...

My Thoughts on How to Approach This Problem

I thought one possible approach is to formulate this as an integer program, letting some $z_k \in \{0,1\}$ stand for whether the corresponding column-vector would be included in the solution or not. However, it is not entire clear how to keep this linear, since

we will likely have to change the optimization function from the quadratic 2-norm to the linear 1-norm (which is undesirable); and
formulating the objective is a problem, since without the $z_k$'s we have to minimize $\left\|\sum_{k=1}^n \vec{M}_k x_k z_k - \vec{b}\right\|_2^2$, but the term $x_k z_k$ is not linear, and we need the constraint $\sum_{k=1}^n z_k = 3$ (or $=5$, depending on the particular problem in question).

More importantly, I am really hoping there will be a way to solve this which is not NP-hard. Thank you so much for your help.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Bumbble Comm On 02 May 2023 - 12:04 BEST ANSWER

You can solve the problem exactly via a branch-and-bound algorithm known as "leaps and bounds":

Furnival, George M., and Robert W. Wilson. "Regressions by Leaps and Bounds." Technometrics 16.4 (1974): 499-511.

Optimal Selection of a Subset of Regression Variables

Formulation

Special Case - Orthonormal Vectors

Related Questions on This Site

My Thoughts on How to Approach This Problem

There are 1 best solutions below

Related Questions in OPTIMIZATION

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions