Conditions for two optimization problems to yield the same solution

526 Views Asked by At

Problem: Consider the optimization problems $$\min_\beta \|y-X\beta\|^2+\alpha\|\beta\|^2 \tag 1$$ and $$\min_\beta \|\beta\|^2 \text{ subject to } \|y-X\beta\|^2 \le c \tag 2$$ where $\|x\|$ is the $2$-norm. Fix $\alpha$, and suppose $\beta^*$ is the solution to ($1$), and let $c=\|y-X\beta^*\|^2$. Is it true that the solution to ($2$) is also $\beta^*$?

Attempt: I believe this is true. The argument should be very similar to the one in Why are additional constraint and penalty term equivalent in ridge regression?. However, I was running some numerical experiments and it turns out the two problems have different solutions. Hence my question here: are the two problems really yielding the same solutions? Are there exceptions that I should be careful of?

2

There are 2 best solutions below

8
On BEST ANSWER

It is true for $\alpha>0$. Since $\beta^*$ is solution of (1), we have: $$\|y-X\beta^*\|^2 + \alpha\|\beta^*\|^2 \le \|y-X\beta\|^2 + \alpha\|\beta\|^2.$$ Reordering: $$\|\beta^*\|^2 \le \frac{1}{\alpha}(\|y-X\beta\|^2 - \|y-X\beta^*\|^2) + \|\beta\|^2.$$ Now, in (2) we take $\beta$ such that $\|y-X\beta\|^2 \le \|y-X\beta^*\|^2$, so we conclude that $$\|\beta^*\|^2 \le \|\beta\|^2,$$ which implies that $\beta^*$ is a minimum of (2).

Analogously, for $\alpha<0$ you can check that $\beta^*$ solution of (1) is also solution of (2), BUT maximizing instead of minimizing: $$\alpha(\|\beta^*\|^2 - \|\beta\|^2) \le \|y-X\beta\|^2 - \|y-X\beta^*\|^2 \le 0 \quad\Longrightarrow\quad \|\beta^*\|^2 \ge \|\beta\|^2.$$ Anyhow, note that you cannot assure equivalence, since the problem becomes non-convex for $\alpha<0$.

0
On

Here from (1)

$$ f(\beta) = y'\cdot y-2\beta'\cdot X'y+\beta'\cdot X'\cdot X\cdot\beta+\alpha \beta'\cdot\beta $$

so the minimum condition gives

$$ -X'\cdot y+X'\cdot X\beta+\alpha\beta = 0 $$

and then

$$ \beta^* = (I\alpha +X'\cdot X)^{-1}X'\cdot y $$

and from (2)

$$ L(\beta,\lambda,\epsilon)=\beta'\cdot\beta + \lambda(y'\cdot y-2\beta'\cdot X'y+\beta'\cdot X'\cdot X\cdot\beta-c+\epsilon^2) $$

the stationary points are

$$ L_{\beta} = 2\beta-2\lambda X'\cdot y + 2\lambda X'\cdot X\cdot\beta = 0 $$

then

$$ (I+\lambda X'\cdot X)\cdot\beta^* = \lambda X'\cdot y $$

or

$$ \beta^* = (I\frac{1}{\lambda}+X'\cdot X)^{-1}\cdot X'\cdot y $$

but

$$ \beta'^*\cdot\beta^*-\lambda\beta'^*\cdot X'\cdot y+\lambda \beta'^*\cdot X'\cdot X\cdot\beta^* = 0 $$

and

$$ \lambda = \frac{\beta'\cdot\beta}{y'\cdot y-\beta'\cdot X'\cdot y-c+\epsilon^2} $$

so the equivalence between (1) and (2) needs

$$ \lambda = \frac{1}{\alpha} = \frac{\beta'^*\cdot\beta^*}{y'\cdot y-\beta'^*\cdot X'\cdot y-c+\epsilon^2} $$

or

$$ \alpha = \frac{(y-X\cdot \beta^*)'\cdot y-c+\epsilon}{\beta'^*\cdot\beta^*} $$

which is quite unlikely