I am reading the ADMM paper by S. Boyd et al - Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.
I'm interested in implementing a L1 regularized feature wise distributed multinomial logistic regression solver using ADMM. So I read page 70, section 8.3.3 of Boyd's paper with great interest. However, I am a bit confused.
Boyd gives the z update as follows:
$$ \bar{z}^{k+1} := \underset{\bar{z}}{\text{argmin}}\left(l(N\bar{z}) + (\rho / 2) ||\bar{z} - \overline{Ax}^{k+1} - u^k||_{2}^{2}\right) $$
... where $ l \left( \cdot \right) $ is the logistic loss function. I understand that this is the proximity operator (Thanks to Derivation of Soft Thresholding Operator / Proximal Operator of L1 Norm) but:
1) I'm not sure how to solve this expression. Does anyone know the logistic proximity operator or how to efficiently derive it?
2) Boyd points out that the logistic proximity operator "can be very efficiently computed by a lookup table that gives the approximate value, followed by one or two Newton steps (for a scalar problem)." Does anyone have any insight into how to do this?
Thank you!
I am assuming that in your case $\ell(x) = \ln(1 + \exp(-c^T x))$ for some vector $c$. I am not aware of any closed-form expression for the optimal $\bar{z}^{k+1}$, but it can easily be computed via Newton's method. You should read Boyd's book "Convex Optimization", where the method is described.