The LASSO problem is well-known problem in ML community given by: \begin{equation} f(x) = \frac{1}{n}\|Ax-b\|^2_2 + \frac{\lambda}{n}\|x\|_1 \end{equation}
This equation appears in paper1 on page 9 (S2GD algorithm). Now, there are two ways to calculate lipschitz constant of the above problem.
$L=\frac{2}{n}\|A^TA\|_2$ which is by going through Hessian way.This can be approximated by $L=\frac{2}{n}\|A^T\|\|A\|$ since former one is costly to compute(from memory perspective).
$L=2*max(sum(X^2,2))$ (in MATLAB, $X^2$ is X.^2). This is basically computing row square; summing and taking max. This is how author in the above paper and their another paper2 has calculated lipschitz constant (however for logistic regression on page 18 of paper2).
My concern is the two ways of calculating L should give the same result but it's NOT!. Specifically, S2GD algorithm is taking long time to converge on different data sets with L set as above.
Can somebody tell me the right way to calculate the lipschitz constant for the lasso problem applicable to S2GD algorithm?