Optimality condition under strong convexity of the Lagrangian

115 Views Asked by At

I have been reading the paper Convergence Analysis of ADMM for a Family of non-convex Problems, and under assumptions (A1-A3), the Lagrangian function is strongly convex. In Eq. (2.21), the authors mentioned that

$$ L(\{x_k^{t+1}\}, x_0^{t+1}, y^t) - L(\{x_k^{t}\}, x_0^{t+1}, y^t) \\ \leq \sum_{k=1}^{K}{\left(\langle\nabla_{x_k} L\{x_k^{t+1}\}, x_0^{t+1}, y^t), x_k^{t+1} - x_k^{t} \rangle - \frac{\gamma_k}{2} \|x_k^{t+1} - x_k^{t}\|^2\right)} \\ \leq - \sum_{k=1}^{K}{\frac{\gamma_k}{2} \|x_k^{t+1} - x_k^{t}\|^2}$$

My questions are the following:

1) In the first inequality, why do the authors take the sum? I understand that the Lagrangian is strongly convex with respect to each $x_k$ but I fail to understand why they took the sum over $k=1,\dots,K$.

2) The second inequality, the authors mentioned that is was obtained due to the optimality of the subproblem: $$ x_k^{t+1} = \arg \underset{x_k}{\min} g_k(x_k) + \langle y_k^t, x_k - x_0^{t+1} \rangle + \frac{\rho_k}{2} \|x_k - x_0^{t+1}\|^2$$ Can ayone see why is this true?

Thank you!

1

There are 1 best solutions below

2
On BEST ANSWER
  1. If $f:\mathbb{R}^n\rightarrow\mathbb{R}$ is a differentiable and $c$-strongly convex function then for any vectors $x=(x_1,...,x_n)$ and $y=(y_1,...,y_n)$ in $\mathbb{R}^n$ we have $$ f(y) \geq f(x) + \nabla f(x)^{\top}(y-x) + \frac{c}{2}||y-x||^2$$ where \begin{align} \nabla f(x)^{\top} (y-x) &= \left(\frac{\partial f(x)}{\partial x_1}, ..., \frac{\partial f(x)}{\partial x_1}\right)^{\top}(y_1-x_1,..., y_n-x_n) \\ &= \sum_{i=1}^n \frac{\partial f(x)}{\partial x_i} (y_i-x_i) \end{align} This is just a dot product of the gradient vector $\nabla f(x)$ with the difference vector $(y-x)$.

  2. They explained the equality (b); it looks to me that their first term is restricting the sum only to terms that are nonzero, and the second term is included only when a condition $\{0 \in C^{t+1}\}$ is met. I suspect that if this condition is not met then $x_0^t=x_0^{t+1}$ so that term would just be zero. (Indeed, looking at their Alg 2, this is the case). Thus, they have not changed the value of the sum, they only removed terms that are zero, so indeed equality (b) holds. For example:

    $$ 8 + 0 + -1 + 2 + 0 + 8 = \underbrace{8 + -1 + 2 + 8}_{\mbox{we can remove the zeros}}$$

  3. If $f:\mathbb{R}^n\rightarrow \mathbb{R}$ is a differentiable function and $x^*$ minimizes $f$ over some convex set $X\subseteq \mathbb{R}^n$, that is $x^* \in \arg\min_{x \in X} f(x)$, then for any other vector $y \in X$ we have $$ \nabla f(x^*)^{\top} (y-x^*) \geq 0$$ which intuitively means that if we take a small step away from our minimizer point $x^*$ in the direction of $y$, then we cannot decrease the value of the function (else, $x^*$ would not be a minimizer).