why do we use this ($\frac{\rho}{2}||Ax - b||^2_2$) particular function in augmented lagrangian? i.e $L_ρ(x,z,y) = f(x) + y^T (Ax − b)+\frac{\rho}{2}||Ax - b||^2_2$
any intuition for that?
why do we use this ($\frac{\rho}{2}||Ax - b||^2_2$) particular function in augmented lagrangian? i.e $L_ρ(x,z,y) = f(x) + y^T (Ax − b)+\frac{\rho}{2}||Ax - b||^2_2$
any intuition for that?
Copyright © 2021 JogjaFile Inc.
Chapter 2 of [1] details the construction of augumented lagrangian. You may want to see Section 2.2.1 in [1] for the geometric intrepreation.
The highlevel overview is that as $\rho$ tends to $\infty$ one expects the constraints to satisfied exactly and thus we will not change the solutions of the original problem. In practice, sufficiently large $\rho$ will work (depends on $f(x)$).
The term is $\frac{\rho}{2}\|Ax-b\|_2^2$ is the penalty term which essentially means that you want to achieve $Ax=b$, otherwise certain positive loss is incurred. You can in principle use any other distances, for example, Bregman distances as in [5].
For theoretical convergence gaurantees, it is good to have the quadratic term in the objective because it is strongly convex. This implies that we can work with mild assumptions on $f(x)$. For example, even if $f$ is convex, augumented lagrangian $L_{\rho}$ is actually strictly convex (in some cases strongly convex) with respect to $x$ (fix $y$ here). Regarding the discussion, you may want to check Section 2.3 of [4].
References.
[1] http://www.mit.edu/~dimitrib/Constrained-Opt.pdf
[2] https://link.springer.com/content/pdf/10.1007/BF00934777.pdf
[3] https://web.stanford.edu/~boyd/papers/pdf/admm_slides.pdf
[4] https://web.stanford.edu/~boyd/papers/pdf/admm_distr_stats.pdf
[5] https://arxiv.org/abs/1306.3203