KKT condition for the proximal algorithm

235 Views Asked by At

enter image description here This slide shows that the KKT condition for the proximal gradient descent is this inequality. I don't know where this comes from. Using KKT , we can only get equality for the stationary condition, right? Can someone explain this to me? Thank you in advance!

1

There are 1 best solutions below

0
On BEST ANSWER

I don't know why this is being called a "KKT condition", but it is straightforward to say that whenever $Q$ is a convex set, $g$ is a function $Q \to \mathbb R$, and $x^* = \arg \min_{x \in Q} g(x)$, we must have $\langle \nabla g(x^*), y - x^*\rangle \ge 0$ for all $y \in Q$.

All this is saying is that, when moving from $x^*$ in the direction of $y$, the rate of change of $g$ is non-decreasing. This is true even when $x^*$ is just a local minimizer. The hypothesis that $Q$ is convex, though, is necessary. Without that, it's possible that an arbitrarily small change from $x^*$ in the direction of $y$ immediately leaves $Q$ - and in that case, it's fine if $g$ decreases in that direction.

In this particular example, $g(x) = f(x_k) + \langle f'(x_k), x-x_k\rangle + \frac1{2h_k}\|x - x_k\|^2$, and we can compute that $$\nabla g(x) = f'(x_k) + \frac1{h_k}(x-x_k).$$

Going the other way for a convex function such as this example's $g(x)$: the condition says that $x_{k+1}$ is a local minimizer along any line through $x_{k+1}$; therefore (by convexity) it's a global minimizer along any line through $x_{k+1}$; therefore it is a global minimizer on all of $Q$.