I'm a Graduate student of Electrical Engineering. I have some basic knowledge on Convex Optimization. For my research, I cam across the following optimization program. With $\mu > 0$, find $\arg \max \{ \mathbf{a}^T\mathbf{x} - \frac{1}{2}\mu ||\mathbf{x}||_2^2 \} \\ \text{s.t. } ||\mathbf{x}||_2^2 \leq 1$
In a paper, I got a similar problem and the unique maximizer to the above problem is given as, $\mathbf{x}^*(\mathbf{a}) = \begin{cases} \frac{\mathbf{a}}{\mu}, & 0\leq ||\mathbf{a}||_2 \leq \mu \\ \frac{\mathbf{a}}{||\mathbf{a}||_2}, & ||\mathbf{a}||_2 > \mu \end{cases} $
Can anyone help me to understand how to get this maximizer? I think I'm missing something very obvious but that completely makes sense as I am not an expert in (convex) optimization. However, I'd like to add one observation that if I put the maximizer's value to the cost function, I get a function very similar to the Huber function. Denoting the cost function as $f(\mathbf{x})$, the $f(\mathbf{x^*})$ which is a function of $\mathbf{a}$ and $\mu$ can be found to be,
$f(\mathbf{x^*}) = f(\mathbf{a},\mu) = \begin{cases} \frac{||\mathbf{a}||_2^2}{2\mu}, & 0\leq ||\mathbf{a}||_2 \leq \mu \\ ||\mathbf{a}||_2 - \frac{\mu}{2}, & ||\mathbf{a}||_2 > \mu \end{cases} $
Would be highly thankful if you can help me solve this optimization problem or any direction to any reference would also be highly appreciated.
You want to equivalently minimize the convex quadratic $Q(\mathbf{x}) = \frac{1}{2}\mu ||\mathbf{x}||_2^2 - \mathbf{a}^T\mathbf{x}$. $Q(\mathbf{x})$ is convex and differentiable, which implies that there exists a unique (global) minimizer, and that is the point where the gradient becomes zero. The gradient of your quadratic is \begin{align} \nabla_{\mathbf{x}}Q(\mathbf{x}) &= \frac{1}{2}\mu \left( \mathbf{I}\mathbf{x} + \mathbf{I}^{T} \mathbf{x}\right) - \mathbf{a}\\ &= \mu \mathbf{x}- \mathbf{a}. \end{align} Setting the gradient equal to $\mathbf{0}$, we get the global minimizer $$ \mathbf{x}_{\star} = \frac{1}{\mu}\mathbf{a}. $$ If $\|\mathbf{a}\|_{2} \le \mu$, then $ \mathbf{x}_{\star}$ is feasible and it is the optimal solution of the constrained maximization. Of course, this might not be the case. For the constrained problem we exploit the following Lemma:
$Q(\mathbf{x})$ is a convex, differentiable function. The constraint set $C$ (here $C$ is the unit $l_{2}$-ball) is convex, nonempty and closed. A point $\widehat{\mathbf{x}} \in C$ is an optimal solution of the constrained maximization if and only if $$ \nabla_{\mathbf{x}}Q(\widehat{\mathbf{x}})^{T}(\mathbf{z}-\widehat{\mathbf{x}}) \ge 0, \quad \forall \mathbf{z} \in C. $$
Now, lets see how from that we get the desired solution. If the global optimal is also a feasible point, then it is the solution of the constrained problem; the above criterion is trivially satisfied since the gradient is zero at $\widehat{\mathbf{x}} = \mathbf{x}_{\star} \in C$. If that is not the case ($\mathbf{x}_{\star} \notin C$), then we look for $\widehat{\mathbf{x}}$ such that $$ \nabla_{\mathbf{x}}Q(\widehat{\mathbf{x}})^{T}(\mathbf{z}-\widehat{\mathbf{x}}) = \left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T} (\mathbf{z}-\widehat{\mathbf{x}}) \ge 0, \quad \forall \mathbf{z} \in C. $$ Lets treat $\widehat{\mathbf{x}}$ as a fixed (unknown) point for now, and minimize the left-hand side with respect to $\mathbf{z} \in C$; if the inequality holds for the $\mathbf{z} \in C$ that minimizes the left-hand side, it will hold for all $\mathbf{z} \in C$. By the Cauchy-Schwarz inequality, we have \begin{align} \left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T} (\mathbf{z}-\widehat{\mathbf{x}}) &= \left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T} \mathbf{z} -\left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T}\widehat{\mathbf{x}}\\ &\ge -\|\mu \widehat{\mathbf{x}} - \mathbf{a}\| -\left( \mu \widehat{\mathbf{x}} - \mathbf{a}\right)^{T}\widehat{\mathbf{x}} \end{align} The inequality can be achieved with equality for $\mathbf{z} = -(\mu \widehat{\mathbf{x}} - \mathbf{a})/\|\mu \widehat{\mathbf{x}}-\mathbf{a}\| \in C$, and hence that is the minimizing $\mathbf{z} \in C$. Now, it suffices to determine $\widehat{\mathbf{x}} \in C$ for which the above is nonnegative. But again by the Cauchy-Schwarz inequality, we have \begin{align} -\|\mu \widehat{\mathbf{x}}-\mathbf{a} \| -\left(\mu \widehat{\mathbf{x}}-\mathbf{a} \right)^{T}\widehat{\mathbf{x}} \le 0 \end{align} for all $\widehat{\mathbf{x}}$ in the unit-ball. Our only hope is to find $\widehat{\mathbf{x}}$ for which the above is equal to zero, i.e to find $\widehat{\mathbf{x}}$ such that $$ \widehat{\mathbf{x}} = -\frac{\mu \widehat{\mathbf{x}}-\mathbf{a}}{\| \mu \widehat{\mathbf{x}}-\mathbf{a}\|}. $$ But note that this means that $\widehat{\mathbf{x}}$ must be colinear with $\mu \widehat{\mathbf{x}}-\mathbf{a}$ and in turn with $\mathbf{a}$, i.e. if $\exists \lambda \in \mathbb{R}$ such that $\widehat{\mathbf{x}} = \lambda \mathbf{a} \in C$. Then, $$ \lambda\mathbf{a} = -\frac{\mu \lambda\mathbf{a}-\mathbf{a}}{\| \mu \lambda\mathbf{a}-\mathbf{a}\|} = -\frac{(\mu \lambda-1)\mathbf{a}}{|\mu \lambda-1| \cdot \|\mathbf{a}\|}. $$ By assumption, we are in the regime where $0 \le \mu < \|\mathbf{a}\|_{2}$. Also note that $\lambda \le 1/\|\mathbf{a}\|_{2}$ because we want $\|\widehat{\mathbf{x}}\| \le 1$. (By the form of $Q(\mathbf{x})$ we can also deduce that $\lambda$ should be positive.). Hence, $\mu \lambda-1 < 0$, and in turn $-(\mu \lambda-1) = |\mu \lambda-1|$. Finally, $$ \lambda\mathbf{a} = \frac{|\mu \lambda-1| \cdot \mathbf{a}}{|\mu \lambda-1| \cdot \|\mathbf{a}\|} =\frac{\mathbf{a}}{\|\mathbf{a}\|} . $$