Optimization of a quadratic function with qudratic constraints

828 Views Asked by At

I'm a Graduate student of Electrical Engineering. I have some basic knowledge on Convex Optimization. For my research, I cam across the following optimization program. With $\mu > 0$, find $\arg \max \{ \mathbf{a}^T\mathbf{x} - \frac{1}{2}\mu ||\mathbf{x}||_2^2 \} \\ \text{s.t. } ||\mathbf{x}||_2^2 \leq 1$

In a paper, I got a similar problem and the unique maximizer to the above problem is given as, $\mathbf{x}^*(\mathbf{a}) = \begin{cases} \frac{\mathbf{a}}{\mu}, & 0\leq ||\mathbf{a}||_2 \leq \mu \\ \frac{\mathbf{a}}{||\mathbf{a}||_2}, & ||\mathbf{a}||_2 > \mu \end{cases} $

Can anyone help me to understand how to get this maximizer? I think I'm missing something very obvious but that completely makes sense as I am not an expert in (convex) optimization. However, I'd like to add one observation that if I put the maximizer's value to the cost function, I get a function very similar to the Huber function. Denoting the cost function as $f(\mathbf{x})$, the $f(\mathbf{x^*})$ which is a function of $\mathbf{a}$ and $\mu$ can be found to be,

$f(\mathbf{x^*}) = f(\mathbf{a},\mu) = \begin{cases} \frac{||\mathbf{a}||_2^2}{2\mu}, & 0\leq ||\mathbf{a}||_2 \leq \mu \\ ||\mathbf{a}||_2 - \frac{\mu}{2}, & ||\mathbf{a}||_2 > \mu \end{cases} $

Would be highly thankful if you can help me solve this optimization problem or any direction to any reference would also be highly appreciated.

4

There are 4 best solutions below

5
On BEST ANSWER

You want to equivalently minimize the convex quadratic $Q(\mathbf{x}) = \frac{1}{2}\mu ||\mathbf{x}||_2^2 - \mathbf{a}^T\mathbf{x}$. $Q(\mathbf{x})$ is convex and differentiable, which implies that there exists a unique (global) minimizer, and that is the point where the gradient becomes zero. The gradient of your quadratic is \begin{align} \nabla_{\mathbf{x}}Q(\mathbf{x}) &= \frac{1}{2}\mu \left( \mathbf{I}\mathbf{x} + \mathbf{I}^{T} \mathbf{x}\right) - \mathbf{a}\\ &= \mu \mathbf{x}- \mathbf{a}. \end{align} Setting the gradient equal to $\mathbf{0}$, we get the global minimizer $$ \mathbf{x}_{\star} = \frac{1}{\mu}\mathbf{a}. $$ If $\|\mathbf{a}\|_{2} \le \mu$, then $ \mathbf{x}_{\star}$ is feasible and it is the optimal solution of the constrained maximization. Of course, this might not be the case. For the constrained problem we exploit the following Lemma:

$Q(\mathbf{x})$ is a convex, differentiable function. The constraint set $C$ (here $C$ is the unit $l_{2}$-ball) is convex, nonempty and closed. A point $\widehat{\mathbf{x}} \in C$ is an optimal solution of the constrained maximization if and only if $$ \nabla_{\mathbf{x}}Q(\widehat{\mathbf{x}})^{T}(\mathbf{z}-\widehat{\mathbf{x}}) \ge 0, \quad \forall \mathbf{z} \in C. $$

Now, lets see how from that we get the desired solution. If the global optimal is also a feasible point, then it is the solution of the constrained problem; the above criterion is trivially satisfied since the gradient is zero at $\widehat{\mathbf{x}} = \mathbf{x}_{\star} \in C$. If that is not the case ($\mathbf{x}_{\star} \notin C$), then we look for $\widehat{\mathbf{x}}$ such that $$ \nabla_{\mathbf{x}}Q(\widehat{\mathbf{x}})^{T}(\mathbf{z}-\widehat{\mathbf{x}}) = \left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T} (\mathbf{z}-\widehat{\mathbf{x}}) \ge 0, \quad \forall \mathbf{z} \in C. $$ Lets treat $\widehat{\mathbf{x}}$ as a fixed (unknown) point for now, and minimize the left-hand side with respect to $\mathbf{z} \in C$; if the inequality holds for the $\mathbf{z} \in C$ that minimizes the left-hand side, it will hold for all $\mathbf{z} \in C$. By the Cauchy-Schwarz inequality, we have \begin{align} \left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T} (\mathbf{z}-\widehat{\mathbf{x}}) &= \left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T} \mathbf{z} -\left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T}\widehat{\mathbf{x}}\\ &\ge -\|\mu \widehat{\mathbf{x}} - \mathbf{a}\| -\left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T}\widehat{\mathbf{x}} \end{align} The inequality can be achieved with equality for $\mathbf{z} = -(\mu \widehat{\mathbf{x}} - \mathbf{a})/\|\mu \widehat{\mathbf{x}}-\mathbf{a}\| \in C$, and hence that is the minimizing $\mathbf{z} \in C$. Now, it suffices to determine $\widehat{\mathbf{x}} \in C$ for which the above is nonnegative. But again by the Cauchy-Schwarz inequality, we have \begin{align} -\|\mu \widehat{\mathbf{x}}-\mathbf{a} \| -\left(\mu \widehat{\mathbf{x}}-\mathbf{a} \right)^{T}\widehat{\mathbf{x}} \le 0 \end{align} for all $\widehat{\mathbf{x}}$ in the unit-ball. Our only hope is to find $\widehat{\mathbf{x}}$ for which the above is equal to zero, i.e to find $\widehat{\mathbf{x}}$ such that $$ \widehat{\mathbf{x}} = -\frac{\mu \widehat{\mathbf{x}}-\mathbf{a}}{\| \mu \widehat{\mathbf{x}}-\mathbf{a}\|}. $$ But note that this means that $\widehat{\mathbf{x}}$ must be colinear with $\mu \widehat{\mathbf{x}}-\mathbf{a}$ and in turn with $\mathbf{a}$, i.e. if $\exists \lambda \in \mathbb{R}$ such that $\widehat{\mathbf{x}} = \lambda \mathbf{a} \in C$. Then, $$ \lambda\mathbf{a} = -\frac{\mu \lambda\mathbf{a}-\mathbf{a}}{\| \mu \lambda\mathbf{a}-\mathbf{a}\|} = -\frac{(\mu \lambda-1)\mathbf{a}}{|\mu \lambda-1| \cdot \|\mathbf{a}\|}. $$ By assumption, we are in the regime where $0 \le \mu < \|\mathbf{a}\|_{2}$. Also note that $\lambda \le 1/\|\mathbf{a}\|_{2}$ because we want $\|\widehat{\mathbf{x}}\| \le 1$. (By the form of $Q(\mathbf{x})$ we can also deduce that $\lambda$ should be positive.). Hence, $\mu \lambda-1 < 0$, and in turn $-(\mu \lambda-1) = |\mu \lambda-1|$. Finally, $$ \lambda\mathbf{a} = \frac{|\mu \lambda-1| \cdot \mathbf{a}}{|\mu \lambda-1| \cdot \|\mathbf{a}\|} =\frac{\mathbf{a}}{\|\mathbf{a}\|} . $$

1
On

I don't think this requires anything too fancy. Note that $a^Tx \leq \|a\|\|x\|$ with equality if and only if $x = \alpha a$ for some $\alpha > 0$. Then, under this assumption, this cost function is

$$\alpha \|a\|^2 - \frac12 \mu \alpha^2 \|a\|^2 = \|a\|^2 (\alpha - \frac12 \mu \alpha^2)$$

You want to maximize this quantity, subject to the constraint that

$$ \alpha \|a\| = \|x\| \leq 1 $$

i.e. $\alpha \in \left[0,\dfrac{1}{\|a\|}\right]$. Consider the two cases $\|a\| \leq \mu$ and $\|a\| > \mu$ and find the $\alpha$ that maximizes the cost (you can just use calculus here, and don't forget to test the endpoints of the interval).

3
On

@BaronVT, At first, I was not convinced why we have to assume that $\mathbf{x}$ is a multiple of $\mathbf{a}$. I could think of the following reasonings. Please let me know if I'm correct. \begin{equation} \mathbf{a}^T\mathbf{x} \leq ||\mathbf{a}||||\mathbf{x}|| \\ \Rightarrow \mathbf{a} ^T\mathbf{x} - \frac{1}{2}\mu ||\mathbf{x}||_2^2 \leq ||\mathbf{a}||||\mathbf{x}|| - \frac{1}{2}\mu ||\mathbf{x}||_2^2 \end{equation} So the cost function is upper bounded by $||\mathbf{a}||||\mathbf{x}|| - \frac{1}{2}\mu ||\mathbf{x}||_2^2$. The minimum value of this upper bound is $-\infty$ unless $\mathbf{x} = \alpha \mathbf{a}$. In this case the minimum of the upper bound is more informative than $-\infty$. The least upper bound is the maximum value of the cost function and this oocurs when $\mathbf{x} = \alpha \mathbf{a}$

Now, under this condition, the problem translates to what you wrote, i.e., the cost function becomes, $\alpha \|a\|^2 - \frac12 \mu \alpha^2 \|a\|^2 = \|a\|^2 (\alpha - \frac12 \mu \alpha^2)$ which is to be minimized under the constraint said by you.

Now applying Calculus and equating the gradient to $0$, we get $\alpha = \frac{1}{\mu}$. $\alpha = \frac{1}{\mu}$ means $\mathbf{x} = \frac{\mathbf{a}}{\mu}$ and to satisfy the constraint $||\mathbf{x}||_2 \leq 1$ (or equivalently, $\alpha \leq \frac{1}{||\mathbf{a}||}$), we have to make sure that $||\mathbf{a}|| \leq \mu$. So, for the case, $||\mathbf{a}|| \leq \mu$ the maximizer is $\mathbf{x} = \frac{\mathbf{a}}{\mu}$.

Now, let us consider the case $||\mathbf{a}|| > \mu$. \begin{equation} \mu < ||\mathbf{a}|| \\ \Rightarrow \frac{1}{2}\mu \alpha^2 < \frac{1}{2}||\mathbf{a}|| \alpha^2 \\ \Rightarrow ||a\|^2 (\alpha - \frac12 \mu \alpha^2) > ||a\|^2 (\alpha - \frac12 ||\mathbf{a}|| \alpha^2) \end{equation} So the cost function is lower bounded by $||a\|^2 (\alpha - \frac12 ||\mathbf{a}|| \alpha^2)$. Instead of finding the maximizer of the cost function directly, we find the maximizer to its lower bound. (Though I'm giving such explanations, I'm not fully convinced. Is there any better argument?) The maximum of this lower bound occurs at $\alpha = \frac{1}{||\mathbf{a}||}$ (Obtained by equating the gradient to $0$). So for the case $||\mathbf{a}|| > \mu$ the maximizer is $\mathbf{x} = \frac{\mathbf{a}}{||\mathbf{a}||}$. So for the two cases combined, the maximizer is given by, $\mathbf{x}^*(\mathbf{a}) = \begin{cases} \frac{\mathbf{a}}{\mu}, & 0\leq ||\mathbf{a}||_2 \leq \mu \\ \frac{\mathbf{a}}{||\mathbf{a}||_2}, & ||\mathbf{a}||_2 > \mu \end{cases} $

0
On

You can just complete the square in the objective function.

This optimization problem is equivalent to finding a minimizer of $\|x\|_2^2 - \frac{2}{\mu} a^T x$ subject to $\|x\|_2 \leq 1$, which is in turn equivalent to finding a minimizer of $\|x\|_2^2 - \frac{2}{\mu} a^T x + \frac{\|a\|_2^2}{\mu^2}$ (subject to the given constraint). (Adding a constant term to the objective function does not change the minimizer.)

But notice that \begin{equation} \|x\|_2^2 - \frac{2}{\mu} a^T x + \frac{\|a\|_2^2}{\mu^2} = \left \|x - \frac{a}{\mu} \right \|_2^2. \end{equation} So, our optimization problem is equivalent to \begin{align} \text{minimize} & \quad \|x - \frac{a}{\mu} \|_2^2 \\ \text{subject to} & \quad \|x\|_2 \leq 1. \end{align} The solution to this is just the projection of $\frac{a}{\mu}$ onto the $2$-norm unit ball.