Least Squares with Euclidean ($ {L}_{2} $) Norm Constraint

6.2k Views Asked by At

Suppose I have set of samples $(x_i,y_i), 1 \leq i \leq n$. I am interested in solving the following optimization problem: $$ \min \sum_{i=1}^n (y_i-a^\top x_i)^2, \quad \text{s.t } \|a\|_{2} = 1. $$

If we assume that $\sum_i x_i x_i^\top$ is invertible, I am wondering if one can prove that the solution to the above optimization problem is $$ a^\ast=\frac{\left(\sum_i x_i x_i^\top \right)^{-1} (\sum_i x_i y_i)}{\|\left(\sum_i x_i x_i^\top \right)^{-1} (\sum_i x_i y_i)\|} $$ Does the above solution still hold if we relax the constraint to be $\|a\|_{2} \leq 1$? Assuming that the above solution does not hold, in this inequality constrained case, does running a projected gradient descent guaranteed to find the true minimum since the problem is convex?

3

There are 3 best solutions below

2
On

No, it is pretty much never the solution, so you will not be able to prove that

5
On

I will try to fully solve it later but just think of the following case, what if the Least Squares solution already have an $ {L}_{2} $ norm which is less then 1?

Your solution will scale it and probably will make not the optimal.

So the solution must be composed of 2 cases, 1 if the LS solution obeys the relaxed constraint and the other not.

By the way, your solution, which is basically projection onto the Euclidean Unit Ball, can be the projection step in numerical solution using the Projected Gradient Method.

My Solution

First, let's rewrite the problem as:

$$ \begin{alignat*}{3} \text{minimize} & \quad & \frac{1}{2} \left\| A x - b \right\|_{2}^{2} \\ \text{subject to} & \quad & {x}^{T} x \leq 1 \end{alignat*} $$

The Lagrangian is given by:

$$ L \left( x, \lambda \right) = \frac{1}{2} \left\| A x - b \right\|_{2}^{2} + \frac{\lambda}{2} \left( {x}^{T} x - 1 \right) $$

The KKT Conditions are given by:

$$ \begin{align*} \nabla L \left( x, \lambda \right) = {A}^{T} \left( A x - b \right) + \lambda x & = 0 && \text{(1) Stationary Point} \\ \lambda \left( {x}^{T} x - 1 \right) & = 0 && \text{(2) Slackness} \\ {x}^{T} x & \leq 1 && \text{(3) Primal Feasibility} \\ \lambda & \geq 0 && \text{(4) Dual Feasibility} \end{align*} $$

From (1) one could see that the optimal solution is given by:

$$ \hat{x} = {\left( {A}^{T} A + \lambda I \right)}^{-1} {A}^{T} b $$

Which is basically the solution for Tikhonov Regularization of the Least Squares problem.

Now, from (2) if $ \lambda = 0 $ it means $ {x}^{T} x = 1 $ namely $ \left\| {\left( {A}^{T} A \right)}^{-1} {A}^{T} b \right\|_{2} = 1 $.

So one need to check the Least Squares solution first.
If $ \left\| {\left( {A}^{T} A \right)}^{-1} {A}^{T} b \right\|_{2} \leq 1 $ then $ \hat{x} = {\left( {A}^{T} A \right)}^{-1} {A}^{T} b $.

Otherwise, one need to find the optimal $ \hat{\lambda} $ such that $ \left\| {\left( {A}^{T} A + \lambda I \right)}^{-1} {A}^{T} b \right\| = 1 $.

For $ \lambda \geq 0 $ the function:

$$ f \left( \lambda \right) = \left\| {\left( {A}^{T} A + \lambda I \right)}^{-1} {A}^{T} b \right\| $$

Is monotonically descending and bounded below by $ 0 $.

enter image description here

Hence, all needed is to find the optimal value by any method by starting at $ 0 $.

Basically the methods is solving iteratively Tikhonov Regularized Least Squares problem.

A demo code + solver can be found in my StackExchange Mathematics Q2399321 GitHub Repository.

1
On

Here is some progress in your solution:

First, note that this problem, in fact, has a solution since the unit sphere is compact and the objective function is continuous. Second, note that the problem is not convex since the restriction $\|a\|=1$ is not affine. On the other hand, the problem is equivalent to

$$min \sum_{i=1}^n(y_i -a^tx_i)^2,\;\; s.t\; \|a\|_2^2-1=0. $$ The advantage of this formulation is that it will make the constraint smooth and will simplify a lot of computation. Since the problem is nonconvex and the LICQ is satisfied at any feasible point, the most we can do is to find all KKT points and choose, among those, the one at which the function $$f(a)= \sum_{i=1}^n (y_i-a^Tx_i)^2$$ attains its minimum value. We proceed as follows:

The lagrangian in this case is

$$L(a,\lambda)= f(a)+\lambda(\|a\|_2^2-1),$$ from which we find that $a$ is KKT iff

$$0= \sum_{i=1}^n-2(y_i-a^Tx_i)x_i +2\lambda a.$$

Now we introduce some notation:

$$X=[x_1,\ldots, x_n],\; A= \sum_{i=1}^{n} x_ix_i^T,\; y= (y_1,\ldots,y_n)^T,\; v=Xy, \; b=A^{-1}v,\; b'=A^{-1}b.$$ Whith this in mind, the KKT condition is just

$$\lambda a= \sum_{i=1}^ny_ix_i -\sum_{i=1}^n(a^tx_i) x_i= Xy- Aa=v-Aa, $$ or equivalently, $$(A+\lambda I)a= v.$$ We only analyze the following case:

$\textbf{Case:}\;-\lambda \notin \sigma(A)$(the set of eigenvalues of A)

Hence $A+\lambda I$ is invertible. Furthermore, its inverse is given by

$$(A+\lambda I)^{-1} =A^{-1} - g(\lambda) A^{-2},$$ where $$g(\lambda)=\frac{\lambda}{1+\lambda \;Tr(A^{-1})}$$ (See this page and this page). So, we have in this case

$$a=(A+\lambda I)^{-1} v= A^{-1}v - g(\lambda) A^{-2}v= b- g(\lambda)b'. $$

But now we use the constraint $\|a\|_2^2=1$ to find $\lambda.$ We should have then

$$\|b\|^2_2- 2g(\lambda)b^Tb' +g^2(\lambda)\|b'\|_2^2=1,$$ from which

$$(\|b'\|_2^2)g^2(\lambda) +(-2b^Tb') g(\lambda)+(\|b\|^2_2-1)=0.$$ The trick now is to note that this is a quadratic equation in $g(\lambda).$ The discriminant of this equation is

$$D= 4((b^Tb')^2- \|b'\|_2^2(\|b\|2^2-1)).$$ If $D<0,$ we conclude that there are not KKT points in this case. Otherwise, we will have

$$g(\lambda)= \frac{b^Tb'+_{-}\sqrt{(b^Tb')^2- \|b'\|_2^2(\|b\|2^2-1)}}{\|b'\|_2^2}.$$

Denote $$r_{+}=\frac{b^Tb'+ \sqrt{(b^Tb')^2- \|b'\|_2^2(\|b\|2^2-1)}}{\|b'\|_2^2},\; r_{-}=\frac{b^Tb'- \sqrt{(b^Tb')^2- \|b'\|_2^2(\|b\|2^2-1)}}{\|b'\|_2^2} $$ This is of course the same as

$$\frac{\lambda}{1+\lambda \;Tr(A^{-1})}= r_{+},r_{-}.$$ From here we find that

$$\lambda_{+}= \frac{r_{+}}{1-r_{+}Tr(A^{-1})} \textrm{ and } \lambda_{-}= \frac{r_{-}}{1-r_{-}Tr(A^{-1}}$$ solve the equation, provided that $r_{+}\neq \frac{1}{Tr(A^{-1})}$ and $r_{-}\neq \frac{1}{Tr(A^{-1})}$ respectively. If one of this cases holds, we miss that solution. In conclusion, the only possible values of $\lambda$ so that the associated $a$ is $KKT$ are as above under the conditions $D\geq 0$ and $r_{+},r_{-}\neq \frac{1}{Tr(A^{-1})}.$

For the remaining case ($-\lambda \in \sigma(A)$) I have some ideas also. However, I don't think that a closed solution is possible. If I have some more time, I will think about it. Hope this helps.