indicator function in objective function with $L_2$ norm

1k Views Asked by At

I am trying to solve an optimization problem. The objective function is as follow

$arg\ min \lVert\mathbb{A}\mathbf{x} - \mathbf{b}\rVert^2 + other\ linear\ least\ squares\ terms + \mathcal{I}(\mathit{x_0<a}) \lVert\mathit{x_0 - a}\rVert^2 + \mathcal{I}(\mathit {x_n>b}) \lVert\mathit{x_n-b}\rVert^2$

$\mathcal{I}$ is the indicator function that returns 1 for true condition and 0 otherwise.

$x_0, x_1, ..., x_n$ should be between a and b.

If $x_0$ or $x_n$ is out of the range, one cost will be added to the objective function.

If the indicator function doesn't appear in the objective function, it's simply one linear least squares optimization problem and is simple to solve. Indicator function is not a continuous function and make the problem difficult.

I am not an expert on numerical optimization. Any hints, links and materials are appreciated.

2

There are 2 best solutions below

9
On

Let: $$ f(\mathbf{x}) = \begin{cases} &\|A\mathbf{x}-\mathbf{b}\|^2, &x_0\geq a, x_n\leq b \\ &\|A\mathbf{x}-\mathbf{b}\|^2 + (x_0-a)^2, &x_0<a, x_n\leq b \\ &\|A\mathbf{x}-\mathbf{b}\|^2 + (x_n-b)^2, &x_0\geq a, x_n>b \\ &\|A\mathbf{x}-\mathbf{b}\|^2+ (x_0-a)^2 + (x_n-b)^2, &x_0<a, x_n>b \end{cases} $$ Then the gradient is: $$ \nabla f(\mathbf{x}) = \begin{cases} &2A^T(A\mathbf{x}-\mathbf{b}), &x_0\geq a, x_n\leq b \\ &2A^T(A\mathbf{x}-\mathbf{b}) + 2I_0(\mathbf{x}-\mathbf{a}), &x_0<a, x_n\leq b \\ &2A^T(A\mathbf{x}-\mathbf{b}) + 2I_n(\mathbf{x}-\mathbf{b}), &x_0\geq a, x_n>b \\ &2A^T(A\mathbf{x}-\mathbf{b}) + 2I_0(\mathbf{x}-\mathbf{a})+I_n(\mathbf{x}-\mathbf{b}), &x_0<a, x_n>b \end{cases} $$ where $I_i$ is a square matrix with $1$ on the $i$-th diagonal entry and of zeros elsewhere.

The optimal solution $\mathbf{x}^*$ is: $$ \mathbf{x}^* = \begin{cases} &(A^TA)^{-1}A^T\mathbf{b}, &x^*_0\geq a, x^*_n\leq b \\ &(A^TA+I_0)^{-1}(A^T\mathbf{b}+I_0\mathbf{a}), &x^*_0<a, x^*_n\leq b \\ &(A^TA+I_n)^{-1}(A^T\mathbf{b}+I_n\mathbf{b}), &x^*_0\geq a, x^*_n>b \\ &(A^TA+I_0+I_n)^{-1}(A^T\mathbf{b}+I_0\mathbf{a}+I_n\mathbf{b}), &x^*_0<a, x^*_n>b \end{cases} $$

20
On

Let me rename your parameters $a$ and $b$ to $x_l$ and $x_u$ to avoid confusion with the vector $b$. You can phrase your problem as a quadratic optimization (QO) problem: $$\min_{x,u,v}\left\{||Ax-b||^2+||u||^2+||v||^2 : u\geq x-x_u, v\geq x_l-x, u\geq 0, v\geq 0\right\}.$$ At optimality, $u_i=\max\{0, x-x_u\}$ (so $u_i=0$ if $x \leq x_u$), and $v_i=\max\{0, x_l-x\}$ (so $v_i=0$ if $x \geq x_l$).

There are many different solvers available for QO problems. YALMIP and CVXPY are modeling tools that allow you to enter a QO in the form above, but they have a memory overhead that might be prohibitive for your problem size. More memory efficient interfaces typically allow only a single variable $x$, forcing you to express the objective and constraints in the following way: $$\min_x \Biggl\{ \begin{pmatrix}x\\u\\v\end{pmatrix}^T \begin{pmatrix}A^TA & O & O \\ O & I & O \\ O & O & I\end{pmatrix} \begin{pmatrix}x\\u\\v\end{pmatrix} + \begin{pmatrix}-2b \\ 0 \\0\end{pmatrix}^T \begin{pmatrix}x\\u\\v\end{pmatrix} + b^Tb : \\ \begin{pmatrix}I & -I & O \\ -I & O & -I \end{pmatrix}\begin{pmatrix}x\\u\\v\end{pmatrix} \leq \begin{pmatrix}x_u e \\ -x_l e\end{pmatrix}, \begin{pmatrix}x\\u\\v\end{pmatrix} \geq \begin{pmatrix}-\infty \\0\\0\end{pmatrix} \Biggl\}$$

In Matlab, you can solve this in the following way:

n = 5000;
m = 5000;
A = rand(m, n);
b = rand(m, 1);

x_l = 0;
x_u = 1;

H = blkdiag(A'*A, eye(n), eye(n));
f = [-2*b; zeros(2*n,1)];
Aineq = [eye(n) -eye(n) zeros(n,n); -eye(n) zeros(n,n) -eye(n)];
bineq = [x_u * ones(n,1); -x_l * ones(n,1)];

[x,fval] = quadprog(H,f,Aineq,bineq,[],[],[-inf(n,1); zeros(2*n,1)],[]);

This solves in around 45 seconds, so if your machine has enough memory and you increase the size to 15000 x 15000, I think it should solve within an hour.