Clarification:$\min_{\mathbf{x}}\left\|\mathbf{y}-\mathbf{x}\right\|_2^2$ s.t. $\mathbf{A}\mathbf{x} \preceq \mathbf{b}$

174 Views Asked by At

Dear optimization experts,

I am sorry if I am reiterating exact question. I have found some related questions, e.g., link1. But not sure whether I have seen exact question. May be I have missed it. So, apologies in advance.

Problem definition:

I have two optimization (constrained least-squares) problems, namely, (P1) with inequality constrained LS and (P2) with equality constrained LS.

P$1$ reads \begin{align} \text{minimize}_{\mathbf{x} \in \mathbb{R}^n} \quad & \frac{1}{2}\left\|\mathbf{y}-\mathbf{x}\right\|_2^2\\ \text{subject to }\quad & \mathbf{A}\mathbf{x} \preceq \mathbf{b} \Leftrightarrow \mathbf{a}_m^T \mathbf{x} \leq b_m \\ \end{align} where $\preceq$ denotes element-wise inequality, $\mathbf{A} = \begin{bmatrix} \mathbf{a}_1^T \\ \vdots \\ \mathbf{a}_M^T \end{bmatrix} \in \mathbb{R}^{m \times n}$, $m < n$, $\mathbf{b} = [b_1, \cdots, b_M]^T\in \mathbb{R}^{m \times 1}$,

and

P$2$ reads \begin{align} \text{minimize}_{\mathbf{x} \in \mathbb{R}^n} \quad & \frac{1}{2}\left\|\mathbf{y}-\mathbf{x}\right\|_2^2\\ \text{subject to }\quad & \mathbf{A}\mathbf{x} = \mathbf{b} .\\ \end{align}

One could form Lagrangian for P$2$ and which is also same for P$1$ except that in P$1$ Lagrange multipliers must be nonnegative (due to KKT conditions).

Forming the Lagrangian for P$1$/P$2$: \begin{align*} L\left(\boldsymbol{\lambda},\mathbf{x}\right) &= \frac{1}{2}\left\|\mathbf{y}-\mathbf{x}\right\|_2^2 + \boldsymbol{\lambda}^T \left( \mathbf{A}\mathbf{x} - \mathbf{b} \right) \\ &= \underbrace{\frac{1}{2}\left\|\mathbf{y}-\mathbf{x}\right\|_2^2}_{:= g(\mathbf{x})} + \sum_{m=1}^M {\lambda_m} \ \underbrace{\left( \mathbf{a}_m^T\mathbf{x} - {b_m} \right)}_{:= h_m(\mathbf{x}) } . \end{align*}

The KKT conditions for P$1$ are: \begin{align} &0 \in \frac{\partial}{\partial \mathbf{x}} g\left(\mathbf{x}\right) + \sum \limits_{m=1}^{M} \lambda_m \ \frac{\partial}{\partial \mathbf{x}} h_m\left(\mathbf{x}\right) \hspace{2mm} \textrm{(stationarity)} \\ &\lambda_m h_m\left(\mathbf{x}\right) = 0 \hspace{2mm} \textrm{(complementary slackness)}\\ &h_m\left(\mathbf{x}\right) \leq 0 \hspace{2mm} \textrm{(primal feasibility)}\\ &\lambda_m \geq 0 \hspace{2mm} \textrm{(dual feasibility)} \end{align}

Solution for P$2$:

If I obtain the solution of P$2$, i.e., by setting the gradient of $L\left(\boldsymbol{\lambda},\mathbf{x}\right)$ with respect to $\mathbf{x}$ and $\boldsymbol{\lambda}$ to zero and then solving the equations for $\mathbf{x}$, which reads \begin{align} \mathbf{x} = \mathbf{y} - \mathbf{A}^T\left( \mathbf{A} \mathbf{A}^T \right)^{-1}\left(\mathbf{A}\mathbf{y} - \mathbf{b} \right) , \end{align} or in general form \begin{align} \mathbf{x} = \mathbf{y} - \mathbf{A}^{\dagger}\left(\mathbf{A}\mathbf{y} - \mathbf{b} \right) , \end{align} where $\mathbf{A}^{\dagger}$ denotes the Moore-Penrose pseudoinverse of $\mathbf{A}$. Thus, we have a closed-form analytical solution for P$2$ (well known). So far, so good.

Solution for P$1$:

If I utilize the stationarity condition, i.e., \begin{align} 0 &\in \frac{\partial}{\partial \mathbf{x}} g\left(\mathbf{x}\right) + \sum \limits_{m=1}^{M} \lambda_m \ \frac{\partial}{\partial \mathbf{x}} h_m\left(\mathbf{x}\right) \\ &\Rightarrow -\left( \mathbf{y} - \mathbf{x} \right) + \sum \limits_{m=1}^{M} \lambda_m \mathbf{a}_m = 0\\ &\Leftrightarrow \mathbf{x} = \mathbf{y} - \sum \limits_{m=1}^{M} \lambda_m \mathbf{a}_m = \mathbf{y} - \mathbf{A}^T \boldsymbol{\lambda} \ . \end{align}

According to KKT conditions, $\lambda_m \geq 0$, from dual feasibility conditions.

If $\lambda_m = 0 \ \forall m$, then $\mathbf{x} = \mathbf{y}$.

If $\lambda_m > 0 \ \forall m$, utilizing the complementary slackness condition, implies that \begin{align} h_m(\mathbf{x}) = 0 \Leftrightarrow \mathbf{a}_m^T \mathbf{x} - b_m = 0, \forall m \ , \end{align} and since $\mathbf{x} = \mathbf{y} - \sum \limits_{m=1}^{M} \lambda_m \mathbf{a}_m = \mathbf{y} - \mathbf{A}^T \lambda$, then for all $m$, solution for $\mathbf{x}$ is \begin{align} \mathbf{x} = \mathbf{y} - \mathbf{A}^T\left( \mathbf{A} \mathbf{A}^T \right)^{-1}\left(\mathbf{A}\mathbf{y} - \mathbf{b} \right) . \end{align}

(Basic) Questions:

  1. If we know that all the Lagrange multipliers are positive (I don't know how to find out though in the first step because we don't know the solution $\mathbf{x}$ though), then the solution of P$1$ is same as P$2$. Agreed? or am I on the wrong path?

  2. So, in general, when some $\lambda_m = 0$ and $\lambda_m > 0$, can we conclude from above that there is no closed form solution for P$1$?

  3. If yes in 2. (which I tend to believe now), then we have to resort to numerical solution. Right? One approach is alternating projections as in link1. But this one is very slow to converge :(. What fast and efficient methods would you suggest/recommend to solve large-scale P$1$ problems?

ADDENDUM: 4. For numerical solution of P$1$, which (fast, efficient, and low-complexity) first-order methods would you recommend?

1

There are 1 best solutions below

2
On BEST ANSWER
  1. Agreed.

  2. Yes, you can conclude that.

  3. Look at interior point methods. A good implementation will require a modest number of steps (say 25-40), each similar to P1.