I saw subgradient optimization of a function $f$ described as the following algorithm:
- start with any $\lambda^{(0)} \ge 0$ then repeat the following for $i = 1, 2, \dots$
- compute a subgradient $g$ for $f$ at $f(\lambda^{(i)})$
- set $\lambda^{(i+1)} = \max\{0, \lambda^{(i)} - t_ig\}$
What happens if $f(\lambda^{(i)})$ is undefined at any point? Can we project $\lambda^{(i)}$ to the domain of $f$? Will this maintain the theoretical guarantees of the algorithm?
Example: If we use subgradient optimization to find the minimum of $f=\{\lambda \mapsto -\lambda : 0 \le \lambda \le 1\}$ and pick $\lambda^{(0)} := 3$ this happens immediately. If we pick $\lambda^{(0)}$ to have a defined value, say $\lambda^{(0)} := 0.5$, we could still have a subgradient of $g=-1$ and a step size of $t:=2.5$. The next value would then also be $\lambda^{(1)} = \lambda^{(0)} - tg = 3$ with the same problem.