Sparsity of $L^1$ Norm Regularization vector

74 Views Asked by At

Consider a generic optimal control problem $$ \min\limits_{u(t)}\int\limits_0^T f(x(t), u(t))\, dt + \mathcal{R}(u) $$ where the evolution of $x(t)$ is given by a generic ODE system $\dot x(t) = g(x(t),u(t))$. In my case functions $f$ and $g$ do not depend on time. I used the following expressions $$ \mathcal{R}_2(u) = \frac{1}{2}\eta\:\|u\|_2^2 \qquad \text{and} \qquad \mathcal{R}_1(u) = \frac{1}{2}\eta\:\|u\|_1 $$ as regularization terms. Numerically speaking I used Forward-Backward sweep method. Consider the problem with $\mathcal{R}_2(u)$. At each FB iteration I compute gradients and use gradient descent method. Instead with the other expression $\mathcal{R}_1(u)$ to minimize the functional at each iteration I used MATLAB function $\texttt{fminunc}$.

Then I would like to compare the results to know if using $\ell^1$ norm I can reach some sparsity of the control vector. So let $u$ be the solution of the optimal control problem using $\mathcal{R}_2(u)$ and let $v$ be the solution using $\mathcal{R}_1(u)$. A first idea to check the sparsity of the vectors was to compare the two $\ell^1$ norms. In other words compare $\|u\|_1$ vs $\|v\|_1$. Using this method the norms are really near for certain parameters of the problem, instead for other parameters the $\ell^1$ norm seems to be lower. Another idea was to check entries inside $v$ if some of them are equal to $0$ or below a certain threshold.

Is there a particular method to study the sparsity or to compare the terms of the two control vectors $u$ and $v$?

1

There are 1 best solutions below

0
On

I'll assume the problem has been discretized so that the optimization variable is a vector $u \in \mathbb R^n$. Your optimization problem has the form $$ \tag{1} \text{minimize} \quad f(u) + g(u) $$ where $f$ is differentiable and $g$ is a "simple" regularization term. I'll assume that $g(u) = \eta \| u \|_1$.

It is common to solve problems of the form (1) with the forward-backward method, also called the proximal gradient method. The iteration for the proximal gradient method with step size $t > 0$ is $$ u^{k+1} = \text{prox}_{tg}(u^k - t \nabla f(u^k)). $$ In words, we take a step in the direction of steepest descent for $f$, then we apply the proximal operator of $g$, which is defined by $$ \text{prox}_{tg}(\hat u) = \arg \min_u g(u) + \frac{1}{t} \|u - \hat u \|_2^2. $$ If $g(u) = \eta \|u\|_1$, then it can be shown that the proximal operator of $g$ simply shrinks each component of $\hat u$ towards the origin by a distance $t \eta$. (If you hit the origin, you stop.)

I believe if you use this optimization algorithm you'll obtain a truly sparse solution to your optimization problem.