How do I approximately get the maximum vs a minimum in a constraint optimization problem?

65 Views Asked by At

I want to solve the two optimization problems:

$$ (1) \text{ } \min f(x) \text{ subject to } g(x) = 0 $$

and

$$ (2) \text{ } \max f(x) \text{ subject to } g(x) = 0 $$

separately of course (one then the other).

I was thinking of defining the auxiliary equations:

$$ L_1(x,\lambda) = f(x) + \lambda g(x) $$

for the minimization version and for the maximization:

$$ L_2(x,\lambda) = - f(x) + \lambda g(x) $$

since both solutions satisfy the standard $ \nabla f(x) = \lambda \nabla g(x) $ (i.e. that the lines and gradients are parallel) it seems I can't distinguish when I am getting a max or a min. So which one am I suppose to use for each one?

Or perhaps I can't distinguish them by designing a auxiliary equation and instead just optimize $L_1$ with gradient descent to optimize constraint (1) and gradient ascent to optimize $L_1$ to get (2) OR gradient descent on $L_2$? I am really confused because I don't understand why my optimization problem if I try to minimize $L_2$ with GD it wouldn't just drive $-f(x)$ to -infinity and ignore g(x) (since if $-f(x) = -\infty $ then $L_2$ is minimized...but it seems to have the weird property that it ignore my constraint... so did I design my auxiliary function wrong or am I using the wrong algorithm or what am I doing wrong?)


To add context is because I am trying to do the following:

I wanted to design an optimization problem where I maximize a non-negative quantity and another one where minimize it both subjected to a constraint.

$$ \min \| r \| \text{ subject to } f(x+r,w) = l$$

and separately then compute:

$$ \max\| r \| \text{ subject to } f(x+r,w) = l$$

the reason I am doing this is in the context of Neural Networks and adversarial examples. Thus, $f(x+r,w)$ is some complicated (non-convex) Neural Network function. In this framework I was wondering what exactly guaranteed as they claim in the paper that they get:

the closest image to x classified as l by f.

For example how do they know its not the farthest imagine classified as $l$? Perhaps not exactly fartherst because the problem might be non-convex but at least find the locally furthest image. In this case it would be sort of the furtherest the image $x$ would go and then change label.

The reason I am asking is because I am interested is comparing the difference between the two cases (a max and a min). I thought that a natural way to design the second one would be with the standard auxiliary equation:

$$ \min \| r \| + \lambda f(x+r,w) $$

(lets ignore the $x+r \in [0,1]^m$ constraint for a moment). Since one can just let $c = \frac{1}{\lambda}$ I assume its equivalent to solve:

$$ \min c\| r \| + Loss(f(x+r,w),y) $$

however what is not clear to me is what if we want to solve the maximization problem:

$$ \max\| r \| \text{ subject to } f(x+r,w) = l$$

do we just do:

$$ \min -\| r \| \text{ subject to } f(x+r,w) = l$$

and solve:

$$ \min -\| r \| + c Loss(f(x+r,w),l)$$

or

$$ \min -c \| r \| + Loss(f(x+r,w),l)$$

what feels odd to me is that in this case we can just arbitrarily make $\| r\|$ large so that the above is minimized at minus infinity. Is that right? I am confused about that. Or perhaps we just consider:

$$ L(r) = -\| r \| + c Loss(f(x+r,w),l) $$

and use gradient ascent instead of gradient descent. Is the algorithm Im using wrong or the auxiliary equation I'm trying to design is wrong...what is wrong?

Note: I am ok with local maximums or minimums.