Best subset selection

148 Views Asked by At

Consider an $n \times d$ data matrix $D$ in which you want to find the best subset of $k$ features that are related to the $n$-dimensional regressand vector $y$. Therefore, the following mixed integer program is formulated with $d$-dimensional real vector $w$, $d$-dimensional binary vector $z$, and an a priori (constant) upper bound $M$ on each coefficient in $w$. The optimization problem is to minimize $\|Dw − y\|^2$ subject to the following constraints:
$$z \in \{0, 1\}^d, \quad w \le Mz, \quad 1^Tz = k$$ The notation $1$ denotes a $d$-dimensional vector of $1$s. Propose an algorithm using block coordinate descent for this problem, where each optimized block contains just two integer variables and all real variables.


I have never deal with block coordinate descent. Should I treat the $z=0$ and $z=1$ as separate blocks and optimize two problems separately using Lagrangian? If no then how should one identify the blocks and what should be done with them.

I just managed to find that the block coordinate descent is the same coordinate descent but with the usage of blocks. However, I don't get how to identify those blocks?

1

There are 1 best solutions below

2
On BEST ANSWER

Given a feasible solution with $z=\hat{z}$, define a block by choosing a pair $(i,j)$ with $\hat{z}_i=1$ and $\hat{z}_j=0$. Now fix $z_i=0$, $z_j=1$, and $z_\ell=\hat{z}_\ell$ for $\ell\not\in\{i,j\}$. Solve the ordinary least-squares regression problem with the regressors prescribed by $z$. If the new objective value is better, keep the new $z$ vector; otherwise revert back to $z_i=1$ and $z_j=0$. Repeat until no pair yields an improvement.