Gradient descent over a restricted convex domain — how do we guarantee that we stay in the domain if the global minimizer is outside of it?

91 Views Asked by At

Let $f:\mathbb R^d \rightarrow \mathbb R$ be a convex function and $A \subset \mathbb R^d$ a convex set. We are interested in finding the minimum of $f$ over $A$. We have the gradient of $f$ and we know that the global unique minimum of $f$ is outside of $A$.

Suppose we decide to do gradient descent to find that minimum. We start in a point of $A$ and follow the opposite of the gradient at each step (while decreasing the learning rate). At some point, this will take us outside of $A$ since the global minimum is outside of $A$. What modification to gradient descent must we apply to guarantee that we stay in $A$ while converging to the minimum in $A$?

What I tried: every time we go outside of $A$, we project that point back on $A$ using the Euclidean norm, and continue the algorithm from that point. Unfortunately, this seems to not converge to the minimum in $A$, but instead to the projection of the global minimum onto $A$. This is not necessarily the minimum in $A$.

1

There are 1 best solutions below

6
On

You can apply a Gradient Descent for Constrained Optimization. It is exactly what you are doing but you need to change $2$ things:

  • At the end, take the average of all your projections.

  • Choose wisely your learning rate.


Let $\varepsilon > 0$, $T = 4\displaystyle\frac{\mathbf{diam}(\mathcal A)^2\left\|\nabla f\right\|_{\infty}^2}{\varepsilon^2}$ and $\eta = \displaystyle\frac{\mathbf{diam}(\mathcal A)}{\left\|\nabla f\right\|_{\infty}\sqrt{T}}$. Using $x^{(0)}$ a starting point of $\mathcal A$, for $t = 1,\ldots, T$

\begin{align} y^{(t)} &\gets x^{(t-1)} - \eta \nabla f\left(x^{(t-1)}\right)\\ x^{(t)} &\gets \texttt{Proj}_{\mathcal A} \left(y^{(t)}\right) \end{align}

At the end output $$z = \frac1T \sum_{t=1}^T x^{(t)} \in \mathcal A$$

Let $z^*$ be a minimizer of $f$ on $\mathcal A$.

\begin{align} \left\|x^{(t)} - z^*\right\|^2 & \le \left\|y^{(t)} - z^*\right\|^2\\ &= \left\|x^{(t-1)} - z^* - \eta\nabla f\left(x^{(t-1)}\right)\right\|^2\\ &= \left\|x^{(t-1)} - z^*\right\|^2 + \eta^2 \left\|\nabla f\left(x^{(t-1)}\right)\right\|^2 - 2\eta\left\langle \nabla f\left(x^{(t-1)}\right),\, x^{(t-1)} - z^*\right\rangle \end{align}

So:

\begin{align} \left\langle \nabla f\left(x^{(t-1)}\right),\, x^{(t-1)} - z^*\right\rangle &\le \frac1{2\eta}\left(\left\|x^{(t-1)} - z^*\right\|^2 - \left\|x^{(t)} - z^*\right\|^2\right) + \frac\eta 2\left\|\nabla f\right\|^2_{\infty} \end{align}

Using the convexity of $f$ you have:

$$f\left(x^{(t-1)}\right) - f\left(z^*\right) \le \left\langle \nabla f\left(x^{(t-1)}\right),\, x^{(t-1)} - z^*\right\rangle$$

Summing and using the convexity of $f$ again,

\begin{align} f(z) - f\left(z^*\right) &= f\left(\frac1T\sum_{t=1}^Tx^{(t)}\right) -f\left(z^*\right)\\ &\le \frac1T\left(\sum_{t=1}^T \left(f\left(x^{(t)}\right) - f\left(z^*\right)\right)\right)\\ &\le \frac{1}{2\eta T}\left(\left\|x^{(0)} - z^*\right\|^2\right) +\frac12\eta\left\|f\right\|_{\infty}^2 \end{align}

Use the value of $\eta$ and $T$ to have:

$$f(z) - f\left(z^*\right) \le \varepsilon$$