Equivalence between quadratic maximization problems with equality and inequality constraints

55 Views Asked by At

In the traditional PCA setting, we have the optimization problem formulated as \begin{align*} &\underset{\mathbf{x}\in\mathbb{R}^n}{\text{max}}\ \mathbf{x}^T\mathbf{A}\mathbf{x}\\ &\text{s.t.}\ ||\mathbf{x}||_2=1, \end{align*} where $\mathbf{A}$ is the covariance matrix which is positive semi-definite. It has an equivalent form like \begin{align*} &\underset{\mathbf{x}\in\mathbb{R}^n}{\text{max}}\ \mathbf{x}^T\mathbf{A}\mathbf{x}\\ &\text{s.t.}\ ||\mathbf{x}||_2\le1, \end{align*} which is easy to be verified by scaling. However, if we consider to append an extra $\ell_1$ constraint, a similar equivalence holds. That is, \begin{align*} &\underset{\mathbf{x}\in\mathbb{R}^n}{\text{max}}\ \mathbf{x}^T\mathbf{A}\mathbf{x}\\ &\text{s.t.}\ ||\mathbf{x}||_2=1\\ &||\mathbf{x}||_1\le s\left(1<s<\sqrt{2}\right) \end{align*} is equivalent with \begin{align*} &\underset{\mathbf{x}\in\mathbb{R}^n}{\text{max}}\ \mathbf{x}^T\mathbf{A}\mathbf{x}\\ &\text{s.t.}\ ||\mathbf{x}||_2\le1\\ &||\mathbf{x}||_1\le s\left(1<s<\sqrt{2}\right), \end{align*} i.e. the optimal solution must happend with the condition $||\mathbf{x}||_2=1$. It seems easy to understand geometrically, but I am confused how to prove this equivalence algebraically.

1

There are 1 best solutions below

1
On BEST ANSWER

Suppose $\mathbf{x} \in \mathbb{R}^n$ is is an optimal solution to \begin{align*} &\underset{\mathbf{x}\in\mathbb{R}^n}{\text{max}}\ \mathbf{x}^T\mathbf{A}\mathbf{x}\\ &\text{s.t.}\ ||\mathbf{x}||_2\le1\\ &||\mathbf{x}||_1\le s\left(1<s<\sqrt{2}\right), \end{align*} and has the maximum 2-norm among all optimal solutions. We will assume that $\|\mathbf{x}\|_2 < 1$ and derive a contradiction. Clearly, $\|\mathbf{x}\|_1 = s$, we could scale $\mathbf{x}$ and obtain a better solution. Furthermore, note that $\mathbf{x}$ can't be an extreme point of $\{\mathbf{v} \in \mathbb{R}^n : \|\mathbf{v}\|_1 \leq s\}$ (ie a point where one cordinate equals $\pm s$) because such a point has 2-norm equal to $s$ which is greater than $1$.

As a result, there exists a nonzero direction vector $\mathbf{d} \in \mathbb{R}^n$ and $\epsilon > 0$ such that $ \|\mathbf{x}\pm\epsilon\mathbf{d}\|_1 = \|\mathbf{x}\|_1 = s$, and $\|\mathbf{x}\pm\epsilon\mathbf{d}\|_2 \leq 1$. So $\mathbf{x}\pm\epsilon\mathbf{d}$ is a feasible solution to our optimization problem. But note that \begin{align*} (\mathbf{x}\pm\epsilon\mathbf{d})^T A (\mathbf{x}\pm\epsilon\mathbf{d}) % &= \mathbf{x}^T A \mathbf{x} \pm \epsilon\mathbf{d}^T A \mathbf{x} \pm\epsilon\mathbf{x}^T A \mathbf{d} + \epsilon^2 \mathbf{d}^T A \mathbf{d} \\ &= \mathbf{x}^T A \mathbf{x} \pm 2\epsilon\mathbf{d}^T A \mathbf{x} + \epsilon^2 \mathbf{d}^T A \mathbf{d} \\ &\geq \mathbf{x}^T A \mathbf{x} \pm 2\epsilon\mathbf{d}^T A \mathbf{x} \end{align*} If $\mathbf{d}^T A \mathbf{x} \neq 0$ then one of $\mathbf{x}+\epsilon\mathbf{d}$ or $\mathbf{x}-\epsilon\mathbf{d}$ has a larger objective value, contradicting the optimality of $\mathbf{x}$. So $\mathbf{d}^T A \mathbf{x} = 0$ and both $\mathbf{x}+\epsilon\mathbf{d}$ and $\mathbf{x}-\epsilon\mathbf{d}$ are optimal as well. However, \begin{align*} \|\mathbf{x}\pm\epsilon\mathbf{d}\|_2^2 &= \|\mathbf{x}\|_2^2 \pm 2\epsilon\mathbf{x}^T\mathbf{d} + \epsilon^2\|\mathbf{d}\|_2^2 \\ &> \|\mathbf{x}\|_2^2 \pm 2\epsilon\mathbf{x}^T\mathbf{d} \end{align*} and since either $\mathbf{x}^T\mathbf{d} \geq 0$ or $-\mathbf{x}^T\mathbf{d} \geq 0$, one of $\mathbf{x}+\epsilon\mathbf{d}$ or $\mathbf{x}-\epsilon\mathbf{d}$ has a larger 2-norm than $\mathbf{x}$ contradicting our choice of $\mathbf{x}$.