Recently, I am learning Kurdyka-Lojasiewicz (KL) inequality in the context of nonconvex-nonsmooth optimization. That is, in the context of the minimization of $f$, i.e., $$ \min_{x \in \mathbb{R}^n} f(x). $$ However, I don't really see what would be an intuition behind the KL inequality.
Let me state the version of KL that I am reading.
Let $f:\mathbb{R}^n \mapsto \mathbb{R}\cup \{\infty\}$. For any $x \in \text{dom}(f)$, the Frechet subdifferential of $f$ at $x$ is $$ \hat{\partial} f(x) = \left\{v \in \mathbb{R}^n : \liminf_{y \ne x, y \to x} \frac{f(y)-f(x) - \langle v, y-x\rangle }{\|y-x\|} \ge 0 \right\}, $$ and the limiting-subdifferential at $x \in \text{dom}(f)$ is defined to be $$ \partial f(x) = \left\{v\in \mathbb{R}^n : \exists x^k \to x, f(x^k) \to f(x), v^k \in \hat{\partial} f(x^k) \to v \right\}. $$ We say $f$ satisfies the KL property at $x^* \in \text{dom}(\partial f)$ if there exist $\eta \in (0,\infty]$, a neighborhood $U$ of $x^*$ and a continuous concave function $\phi:[0,\eta) \to \mathbb{R}_+$ such that $$ (i)\hspace{0.2cm} \phi(0) = 0, \quad (ii) \hspace{0.2cm}\phi \in C^1(0,\eta), \quad (iii) \hspace{0.2cm} \phi'(s) > 0, \forall s \in (0,\eta), $$ and (iv) For all $x \in U_\eta:= U\cap \{z \in \mathbb{R}^n: f(x^*) < f(z) < f(x^*) + \eta\}$, the KL inequality holds $$ \phi'(f(x)-f(x^*))\text{dist}(0,\partial f(x)) \ge 1. $$
Here is my interpretation:
- $\hat{\partial} f(x)$: It is a generalized notion of differentiation (local convexity) of $f$ at $x$.
- $\partial f(x)$: To ensure the closeness of $\hat{\partial} f(x)$, the notion of the Frechet subdifferential is extended to the limiting-subdifferential.
- $x^* \in \text{dom}(\partial f)$: I understand this as follows: $x^* \in \text{dom}(\partial f)$ if $\partial f(x) \ne \emptyset$. This is because there might be $x \in \text{dom}(f)$ such that for any convergent sequence $x^k \to x$, $\bigcap_{k=1}^\infty \bigcup_{j=k}^\infty \hat{\partial}f(x^j) = \emptyset$, i.e., $\partial f(x) = \emptyset$.
- $U$ and $\eta \in (0,\infty]$: This indicates that the KL property is a local property which only conerns a neighborhood of $x^*$ where $f$ is controlled within $(f(x^*), f(x^*) + \eta)$.
- Concave function $\phi$: By (i)-(iii), $\phi$ is a non-negative, continuously differentiable and increasing function in $(0,\eta)$. Also, $\phi(ax + (1-a)y) \ge a\phi(x) + (1-a)\phi(y)$.
- (iv): To be honest, I am not sure how to interprete this inequality. If I just think of $\text{dist}(0,\partial f(x)) = \|\nabla f(x)\| \ne 0$ (assuming $f$ is differentiable), (iv) becomes $$ \phi'(f(x) - f(x^*)) \ge \frac{1}{\|\nabla f(x)\|}. $$ Since $x \in U_\eta$, $0 < f(x) - f(x^*) < \eta$. Thus, the KL inequality shows the behavior of $\phi'(z)$ in $(0,\eta)$, i.e., how fast $\phi$ changes in $(0,\eta)$ through $f$. And the inequality says that $\phi$ has to increase at least with the rate of $\frac{1}{\|\nabla f(x)\|}$.
More importantly, I don't see how and why the KL property is a key assumption in the proof of convergence of iterative methods. Naively speaking, it seems to me that all we want is to find a point $x^* \in \text{dom}(f)$ such that $\nabla f(x^*) = 0$ or $\approx 0$ (assuming $f$ is differentiable).
Any comments/answers/references will be very appreciated.
I recommend Proximal alternating linearized minimization for nonconvex and nonsmooth problems by Bolte et al. What happens in this paper is basically the sequence generated by the proposed algorithm has finite length property, i.e., $$\sum_{k=0}^\infty|z^{k+1}-z^k|<\infty$$ thanks to KL. And this is a big deal because finite length property means your sequence converges really fast.
Yes, KL is a local property. A good start to understand KL is to play with functions on the line. You can try to verify the KL property of $x\mapsto e^x$, $x\mapsto x^2$ and $$x\mapsto\begin{cases}\exp{(-1/x^2)}\sin(1/x),x\neq0;\\0,x=0.\end{cases}$$.