Sum of squared maximization with a norm constraint

321 Views Asked by At

I have the following optimization \begin{align} \max_{\|x\|^2\le1} \|Lx - y\| \end{align} where $L$ is a lower triangular and $y$ is a given vector. Does it admit a closed-form solution?

I am interested in the general case where $L$ may be not invertible and even not square, but any progress under some assumptions will be muchnappreciated.

The motivation is to understand the standard least squares method but as a game from the "adversarial" point of view: $y$ is a measurement sequence and $x$ is the set of possible states from $y = Lx + n$.

2

There are 2 best solutions below

5
On BEST ANSWER

We can consider the following equivalent objective:

$$\|Lx - y\|^2=x^T\color{blue}{L^TL}x-2y^TLx+y^Ty.$$

As $L^TL$ is a symmetric (positive semidefinite) matrix, it admits the spectral decomposition: $$\color{blue}{L^TL = Q^T \Lambda Q }$$ for some orthogonal matrix $Q$ with $Q^TQ=I$ and diagonal matrix $\Lambda \, (\ge 0)$. Then, defining $\color{blue}{z=Qx}$ and $\color{blue}{b:=2QL^Ty}$, the problem is equivalent to:

$$\color{blue}{\max_{\|z\|^2=1} \quad z^T\Lambda z-b^Tz+y^Ty }.$$

Note that here we consider the equivalent constraint $$\|x\|^2=x^Tx=x^TIx=x^TQ^TQx=z^Tz=\|z\|^2=1.$$ As the problem is convex maximization (maximization of the convex quadratic function $z^T\Lambda z-b^Tz+y^Ty$ over the convex set $\|z\|^2 \le 1$), there is an optimal solution on the boundary (at some extreme point), so the constraint $\|z\|^2 \le 1$ can be replaced with $\|z\|^2 = 1$.

For the $\color{blue}{\text{new problem}}$, from the KKT conditions, the maximizer is among normalized vectors $z$ (with $\|z\|^2=1$) that satisfy the following system:

$$\Lambda z-b+2\mu z=0 \equiv \\ \color{blue}{(\Lambda+2\mu I)z=b}$$

for some $\mu \in \mathbb R$ (note that as we consider the equality constraint $\|z\|^2 = 1$, $\mu$ can be any number in $\mathbb R$). Now we should find all the feasible solutions of this system (feasible KKT points) by noting that $\color{blue}{\Lambda+2\mu I}$ is a diagonal matrix, which simplifies the search procedure. All the feasible KKT points can be obtained in the following two steps:

  1. If some element $i$ of $b$ is $0$, then either $z_i$ is $0$ or $-2\mu$ is equal to the $i$th element of $\Lambda$, i.e., $\mu=-\frac{\lambda_i}{2}$. The first scenario is investigated in Step 2. Under the second scenario, $z_j=\frac{b_j}{\lambda_j- \lambda_i}$ for those $j$ with $b_j \neq 0$, $z_j=0$ for all other $j \neq i$. If this vector satisfies $\|z\|^2=1$, then it is a feasible KKT point.

  2. After fixing $z_j=0$ for any $j$ for which $b_j=0$, for the other $j$ we have $$z_j=\frac{b_j}{\lambda_j+2\mu}$$ where $\mu$ can be any solution of the following equation: $$\sum_{j\in[n]: b_j\neq 0} \left (\frac{b_j}{\lambda_j+2\mu} \right )^2=1.$$

After, fining all the feasible KKT points $z^{1},\dots, z^{m}$ (note that $m\ge 1$ as the feasible space is compact and the objective is continuous), we need to determine the one (or those) that maximizes the objective $z^T\Lambda z-b^Tz+y^Ty$, let us denote it by $z^*$. Finally, $x^*$ is any solution that satisfies $z^*=Qx^*$.


Remark 1: In Step 2, we need to find the roots $\mu$ of the following equation $$\sum_{j=1}^n \left (\frac{b_j}{\lambda_j+2\mu} \right )^2=1,$$ which involves a polynomial of degree $2n$. As the function has a special structure, it can be proven that it always has at least two roots and can have up to $2n$ roots (see here). Thus, when none of the elements of $b$ is zero, we can have up to $2n$ candidate feasible KKT points. If $k$ elements of $b$ are zero, the maximum number of roots is $2(n-k)$ and we can have up to $k$ feasible KKT points from Step 1. Thus, the number of candidate feasible KKT points obtained in Steps 1 and 2 can be from 2 up to $2n$.

Remark 2: For the special case of $b = 0$, there are up to $n$ candidate feasible KKT points, obtained in Step 1 by setting $-2\mu$ to each of $\lambda_j, j=1,...,n$ (which are the eigenvalues of $L^TL$). The $n$ feasible KKT points are $$z^1=e_1,\dots,z^n=e_n.$$ Note that in this case, the feasible KKT corresponding to the largest $\lambda_j$ (which is the largest eigenvalue of $L^TL$) becomes the optimal solution and the optimal value becomes $\lambda_{\max}(L^TL)$, i.e., $$\max_{\|x\|^2 \le 1} \, x^TL^TLx= \max_{\|z\|^2=1} \, z^T\Lambda z=\lambda_{\max}(L^TL),$$ which is a known result.

Remark 3 ($\color{red}{\text{Update 1}}$):

I just realized that, when none of the elements of $b$ is zero, the KKT point defined by the smallest root (denote it by $\mu_{min}$) of the equation:

$$\sum_{j=1}^n \left (\frac{b_j}{\lambda_j+2\mu} \right )^2=1,$$

is the optimal solution. Let me first show that the smallest root is always negative. As all $\lambda_j, j=1,\dots, n$ are non-negative, the above equation always has a negative root in the interval

$$\left (-\infty, -\frac{\max_{j \in [n] }(\lambda_j)}{2} \right), $$

by observing that the LHS of the equation tends to $0$ as $\mu \to -\infty$ and tends to $+\infty$ as $2\mu \to \left (-\max_{j \in [n] }(\lambda_j) \right )^-$. Hence, we always have

$$2\mu_{min}<-\max_{j=1,\dots, n}\lambda_j \le 0.$$

Indeed, the feasible KKT vector $\tilde{z}$ defined by $\mu_{min}$ with

$$\tilde{z}_j=\frac{b_j}{\lambda_j+2\mu_{min}}, j=1,\dots, n$$

is the only candidate whose sign is opposite to the sign of the vector $b$ (located in the opposite orthant), that is,

$$\color{blue}{\text{sgn} (\tilde{z} )=-\text{sgn}(b)}.$$

The above property allows the feasible KKT point $\tilde{z}$ determined by the smallest root $\mu$ to have the longest distance from the point $b$ among the other feasible KKT points.

Remark 4 ($\color{red}{\text{Update 2}}$):

Using the observation made in Remark 3, we can conclude that if for some $i$ we have

$$b_i=0$$ $$\lambda_i =\max_{j \in [n] }(\lambda_j)$$ $$\sum_{j \in [n]: \lambda_j \neq \lambda_i }\left(\frac{b_j}{\lambda_j-\lambda_i} \right )^2<1,$$

then the feasible KKT point obtained for $2\mu=-\lambda_i$ from Step 1 is the optimal solution, that is, the following point (or points if we have $j \neq i$ with $\lambda_j=\lambda_i$):

$$\sum_{j \in [n]: \lambda_j=\lambda_i }z^2_j=1- \sum_{j \in [n]: \lambda_j \neq \lambda_i}\left(\frac{b_j}{\lambda_j-\lambda_i} \right )^2, \\ z_j= \frac{b_j}{\lambda_j-\lambda_i} ,j \in [n]: \lambda_j\neq\lambda_i.$$

Otherwise, the feasible KKT point obtained in Step 2 for the smallest root of the equation:

$$\sum_{j \in [n]: b_j \neq 0} \left (\frac{b_j}{\lambda_j+2\mu} \right )^2=1,$$

is the optimal solution.

This follows from the fact that the optimal solution of the problem continuously depends on $b$. To reach the above observation, for those $j$ with $b_j=0$, we can define $b_j=0+\epsilon $ and then use the point in Remark 3. Note that for $i$ with $b_i=\epsilon$ and $\lambda_i =\max_{j \in [n] }(\lambda_j)$, $2\mu_{\min} \to - \max_{j \in [n] }(\lambda_j)$ as $\epsilon \to 0^- .$

Remark 5: The problem is a non-convex optimization problem (known as convex maximization, or equivalently concave minimization), but it can be reformulated as a semi-definite programming model, which is a subclass of convex optimization, I have not provided it as the OP seeks a closed-form solution.

2
On

In the sequel, we assume that the matrices and vectors are real and the norm is Euclidean. If the field is complex, let $L=A+iB,\,\mathbf x=\mathbf p+i\mathbf q$ and $\mathbf y=\mathbf a+i\mathbf b$. The problem is then equivalent to its ‘realified’ version $$ \max_{\|\mathbf p\|^2+\|\mathbf q\|^2\le1}\left\| \pmatrix{A&-B\\ B&A}\pmatrix{\mathbf p\\ \mathbf q}-\pmatrix{\mathbf a\\ \mathbf b}\right\|. $$

So, assume that we work over $\mathbb R$. Let $r=\operatorname{rank}(L)$ and $L=USV^T$ be an economic/compact singular value decomposition of $L$, where $S=\operatorname{diag}(s_1,s_2,\ldots,s_r)$ is a positive diagonal matrix and each of $U$ and $V$ has $r$ orthonormal columns. Let $\mathbf v=V^T\mathbf x$ and $\mathbf z=U^T\mathbf y$, where $\mathbf v,\mathbf z\in\mathbb R^r$. Then $\|L\mathbf x-\mathbf y\|=\|S\mathbf v-\mathbf z\|$ and hence $\max_{\|\mathbf x\|\le1}\|L\mathbf x-\mathbf y\|=\max_{\|\mathbf v\|\le1}\|S\mathbf v-\mathbf z\|$. Since $S$ is invertible, the image of the open unit ball under $S$ is an open set. It follows that the maximiser $\mathbf v$ of the transformed problem $\max_{\|\mathbf v\|\le1}\|S\mathbf v-\mathbf z\|$ must occur at the boundary of the unit ball, i.e., it is a unit vector. The optimisation problem is therefore equivalent to maximising $$ f(\mathbf v) = \|S\mathbf v-\mathbf z\|^2 = \mathbf v^TS^2\mathbf v - 2\mathbf v^TS\mathbf z + \|\mathbf z\|^2 $$ on the unit sphere $\|\mathbf v\|=1$.

This can be solved by the method of Lagrange multipliers, but a matrix-theoretic approach seems easier here. At a maximiser $\mathbf v$, we have $f(\mathbf v)\ge f(\mathbf w)$ for any unit vector $\mathbf w$. In particular, for any skew-symmetric matrix $K$, if we put $g(t)=f(e^{tK}\mathbf v)$, then $t=0$ will be a critical point of $g$. Therefore $$ 0=g'(0)=\operatorname{tr}\left(K^T(S^2\mathbf v\mathbf v^T-S\mathbf z\mathbf v^T)\right) $$ for every skew-symmetric matrix $K$, meaning that $S^2\mathbf v\mathbf v^T-S\mathbf z\mathbf v^T$ is a symmetric matrix. Hence $S^2\mathbf v-S\mathbf z=\lambda \mathbf v$ for some $\lambda\in\mathbb R$, i.e., $(S^2-\lambda I)\mathbf v=S\mathbf z$. There are two cases:

  1. If $\lambda\not\in\{s_1^2,s_2^2,\ldots,s_r^2\}$, then $S^2-\lambda I$ is invertible and $$ \mathbf v=(S^2-\lambda I)^{-1}S\mathbf z.\tag{1} $$ Since $\mathbf v$ is a unit vector, we must have $$ 1=\|\mathbf v\|^2=\sum_{i=1}^r\frac{s_i^2z_i^2}{(s_i^2-\lambda)^2}. $$ Therefore the feasible choices of $\lambda$ are those real roots of the polynomial equation $$ p(\lambda) =\prod_{j=1}^r(\lambda - s_i^2)^2 -\sum_{i=1}^rs_i^2z_i^2\prod_{j\ne i}(\lambda - s_i^2)^2=0 $$ that lie outside $\{s_1^2,s_2^2,\ldots,s_r^2\}$. Hence there are at most $\deg p=2r$ feasible choices.
  2. If $\lambda\in\{s_1^2,s_2^2,\ldots,s_r^2\}$, it solves the equation $(S^2-\lambda I)\mathbf v=S\mathbf z$ for some unit vector $\mathbf v$ if and only if $(S^2-\lambda I)(S^2-\lambda I)^+S\mathbf z=S\mathbf z$ and $\|(S^2-\lambda I)^+S\mathbf z\|\le1$. For each feasible $\lambda=s_i^2$, any unit vector of the form $$ \mathbf v=(S^2-\lambda I)^+S\mathbf z+\mathbf w; \quad\mathbf w\in\ker(S^2-\lambda I) $$ will solve $(S^2-\lambda I)\mathbf v=S\mathbf z$. The choice of $\mathbf w$ is irrelevant — as long as it makes $\|\mathbf v\|=1$ — because it does not contribute to the value of $f(\mathbf v)$: \begin{align*} f(\mathbf v) &=\mathbf v^TS^2\mathbf v - 2\mathbf v^TS\mathbf z + \|\mathbf z\|^2\\ &=\mathbf v^T(\lambda\mathbf v+S\mathbf z) - 2\mathbf v^TS\mathbf z + \|\mathbf z\|^2\\ &=\lambda - \mathbf z^TS\mathbf v + \|\mathbf z\|^2 \quad\text{(this is where we need $\|\mathbf v\|=1$)}\\ &=\lambda - \mathbf z^TS\left[(S^2-\lambda I)^+S\mathbf z+\mathbf w\right] + \|\mathbf z\|^2\\ &=\lambda - \mathbf z^TS(S^2-\lambda I)^+S\mathbf z - \mathbf z^TS\mathbf w + \|\mathbf z\|^2\\ &=\lambda - \mathbf z^TS(S^2-\lambda I)^+S\mathbf z - \mathbf v^T(S^2-\lambda I)\mathbf w + \|\mathbf z\|^2\\ &=\lambda - \mathbf z^TS(S^2-\lambda I)^+S\mathbf z + \|\mathbf z\|^2.\\ \end{align*} In particular, when $\lambda=s_i^2$, we may always pick $$ \mathbf v=(S^2-\lambda I)^+S\mathbf z +\left(1-\|(S^2-\lambda I)^+S\mathbf z\|^2\right)^{1/2}\mathbf e_i^{(r)}\tag{2} $$ where $\mathbf e_i^{(r)}=(0,\ldots,0,1,0,\ldots,0)^T$ denotes the $i$-th vector in the standard basis of $\mathbb R^r$.

These two cases are usually but not always mutually exclusive. For instance, when $\mathbf z$ is entrywise nonzero, only case 1 can occur. In contrast, when $S$ is not a scalar multiple of the identity matrix and $\mathbf z$ contains some zero entries, the equation $(S^2-\lambda I)\mathbf v=S\mathbf z$ may be solvable in both cases. So, in general, there may be up to $3r$ feasible choices of $\lambda$. Among them, pick the one that maximises $f(\mathbf v)=\|S\mathbf v-\mathbf z\|^2$, where $\mathbf v$ is computed according to $(1)$ or $(2)$. Then $\mathbf x=V\mathbf v$ will be a global maximiser to the original problem $\max_{\|\mathbf x\|\le1}\|L\mathbf x-\mathbf y\|$.

Example. Let $L=\pmatrix{1&0\\ 0&0},\mathbf x=\pmatrix{x_1\\ x_2}$ and $\mathbf y=\pmatrix{y_1\\ y_2}$. We want to maximise $\|L\mathbf x-\mathbf y\|=\sqrt{|x_1-y_1|+|y_2|}$ subject to $x_1^2+x_2^2\le1$. Since $L$ is already a singular value matrix, we may take $U=V=\pmatrix{1\\ 0}$ and $S=I_1=1$, so that $\mathbf v=V^T\mathbf x=x_1$ and $\mathbf z=U^T\mathbf y=y_1$.

In case 1, we require that $\lambda\not\in\{s_1^2,\ldots,s_r^2\}=\{s_1^2\}=\{1\}$, i.e., $\lambda\ne1$. We look for real roots $\lambda\ne1$ of the polynomial equation $$ p(\lambda)=(\lambda-1)^2-y_1^2=0. $$ Such real roots exist only when $y_1\ne0$. If this is the case, the feasible choices of $\lambda$ are $1\pm|y_1|$. We obtain $$ x_1=\mathbf v =(S^2-\lambda I)^{-1}S\mathbf z =(1-\lambda)^{-1}y_1=\pm1 $$ and $\mathbf x=V\mathbf v=\pmatrix{\pm1\\ 0}$. Since $y_1\ne0$, one may verify that one of $x_1\in\{-1,1\}$ gives a global maximum of $\|L\mathbf x-\mathbf y\|=\sqrt{|x_1-y_1|+|y_2|}$ and the other is either a local minimum (when $|y_1|>1$) or a saddle point (when $0<|y_1|\le1$).

In case 2, again, since $\{s_1^2,\ldots,s_r^2\}=\{s_1^2\}=\{1\}$ is a singleton set, we only need to solve $(S^2-\lambda I)\mathbf v = S\mathbf z$ for a unit vector $\mathbf v$ where $\lambda=s_1^2=1$, i.e., we need to solve $(1-1)x_1 = 1y_1$ for $\|\mathbf v\|=|x_1|=1$. This is solvable only when $y_1=0$. If this is the case, the general solution is given by any \begin{align*} \mathbf v &=(S^2-\lambda I)^+S\mathbf z+\mathbf w;\quad\mathbf w\in\ker(S^2-\lambda I)\\ &=(1-1)^+1y_1+\mathbf w;\quad\mathbf w\in\ker(1-1)\\ &=\mathbf w;\quad\mathbf w\in\ker(0)=\mathbb R^r=\mathbb R\\ \end{align*} such that $\|\mathbf v\|=1$. This means $\mathbf v=\mathbf w=\pm1$ and $\mathbf x=V\mathbf v=\pmatrix{\pm1\\ 0}$. As mentioned earlier, the choice of $\mathbf w$ does not matter as long as it keeps $\mathbf v$ a unit vector. Here one may verify that both $\mathbf v=\mathbf w=1$ and $\mathbf v=\mathbf w=-1$ (i.e., $x_1=\pm1$) maximise the value of $\|L\mathbf x-\mathbf y\|=\sqrt{|x_1|+|y_2|}$.