Explaining the "well-known" optimization of this particularly simple convex, non-differentiable function?

153 Views Asked by At

I've been programming algorithms for solving L1-regularized logistic regression with large datasets. As such, I've been delving into the computer science literature, and came across the following puzzling fact. (I will simplify the notation for convenience.)

Let $x$ be in the set of real-valued numbers, and given real-valued parameters $a,b,c$, let $f(x) = 1/2 a x^2 + b x + |c + x | - |c|$. Then it is "well-known" that

$argmin_x \; f(x) = \begin{cases} -\displaystyle\frac{(b+1)}{a} & \text{if } b+1 \leq ac \\ -\displaystyle\frac{(b-1)}{a} & \text{if } b-1 \geq ac \\ - c & \text{otherwise} \\\end{cases}$

I see the obvious parallel with finding the vertex of the parabola from precalculus, but honestly I am not sure how to get the solution to this one. How is this "well-known"? Can anybody either explain the solution or at least refer me to where I can find this kind of problem discussed? I am familiar with the statistics literature, but have never formally studied convex optimization.

2

There are 2 best solutions below

9
On BEST ANSWER

There is nothing special here. You just try to solve the problem analytically removing $||$ and assuming that a)$c +x\ge $ and b) $c+x\le 0 $. You will get then 2 different parabols a)$1/ax^2+(b+1)x$ and b)$1/2ax^2+(b-1)x$. Their solutions are $-\frac{b\pm 1}{a}$ or edge $x=-c$ ofthe intervals $[-c,\infty) $(\infty,-c]$,$ if the corresponding inequality $c\le\ge x=-\frac{b\pm 1}{a}$ is not satisfied. Finally you get the above solution.

PS. You have a type in your conditions it should be $b\pm 1 \ge \le ac$

0
On

ED: Here's an answer that pieces together Alexander Vigodner's ideas and my own reflections. To my relatively ordinary mathematical mind, this problem looks deceptively simple and similar to a standard calculus optimization problem. However, the non-differentiability of the cost function requires a little "trick" at the end, as explained below:

Solution: We can remove the constant from the goal function. So let's define \begin{equation*} g(x)=\displaystyle\frac{1}{2} ax^2+bx+|c+x| \end{equation*}

Now we can apply optimization techniques from calculus (so long as we treat the piecewise nature of the function carefully). This function is piecewise parabolic, with the two "pieces" defined over $\{x: x +c \geq 0 \}$ and $\{x: x+c \leq 0 \} $, and the pieces are "glued" together at $x=-c$ so that the function is continuous. In particular, we have:

\begin{equation*} g (x) = \begin{cases} g_1(x) := \displaystyle\frac{1}{2}ax^2 + (b+1)x +c & \text{ for } x \in D_1 := \{x : x +c \geq 0 \} \\ g_2(x) := \displaystyle\frac{1}{2}ax^2 + (b-1)x -c & \text{ for } x \in D_2 :=\{x : x +c \leq 0 \} \\\end{cases} \end{equation*}

Now we treat the two parabolic pieces, $g_1(x)$ and $g_2(x)$, as if they extended over the entire domain $\mathbb{R}$. Call these extended functions $\tilde{g}_1(x)$ and $\tilde{g}_2(x)$. Each of these extended functions has its own vertex, which we consider to be a "critical point":

\begin{equation*} x^*_1=-(b+1)/a \quad \text{and} \quad x^*_2=-(b-1)/a \end{equation*}

Moreover, we obtain a third critical point, due to the non-differentiability, at $x^*_3=-c$.

Now ordinarily in calculus optimization, we would evaluate the function at these critical points, and compare the values of the function in the image. We should NOT do this. (If so, have fun comparing the following expressions.)

\begin{equation*} g(x_1^*)= -\displaystyle\frac{1}{2}\displaystyle\frac{(b+1)^2}{a} +c , \quad g(x_2^*)= -\displaystyle\frac{1}{2}\displaystyle\frac{(b-1)^2}{a} -c , \quad g(x_3^*) = \displaystyle\frac{1}{2} ac^2 -bc \end{equation*}

Rather, the trick to solving this problem comes from being careful & clever with the domain partitioning caused by the point non-differentiability :

A. First, let us revisit the domain restrictions:

  1. If $x_1^*$ is the solution to the problem (i.e. if $x_1^* = argmax_x g(x)$), then the vertex of the parabola $\tilde{g}_1(x)$ certainly lies in $D_1 := \{ x: x+c \geq 0 \}$. So we have \begin{equation*} x_1^* \in D_1 \quad \Leftrightarrow \quad x_1^* + c \geq 0 \quad \Leftrightarrow \quad b+1 \leq ac \end{equation*} 2.If $x_2^*$ is the solution to the problem (i.e. if $x_2^* = argmax_x g(x)$), then the vertex of the parabola $\tilde{g}_2(x)$ certainly lies in $D_2 := \{ x: x+c \leq 0 \}$. So we have \begin{equation*} x_2^* \in D_2 \quad \Leftrightarrow \quad x_2^* + c \leq 0 \quad \Leftrightarrow \quad b-1 \geq ac \end{equation*}

B.Now we show that those "if" statements are actually bidirectional. Note that, given the number $ac \in \mathbb{R}$, exactly ONE of the following conditions holds:

  1. $ac \geq b+1$
  2. $ac \leq b-1$
  3. $ac \in (b-1,b+1) $