Which one is better to minimize SSE or MSE in ADMM?

178 Views Asked by At

I am minimizing the following ERM objective function. \begin{equation} \sum_i^m \ell(w;x_i,y_i) + r(w) \end{equation} within ADMM framework. ADMM convergence takes a long time (primal and dual residual decay very slowly) when I divide the loss term by $m$. Although, the objective function starts going up after certain number of iterations and change is quite small. Can somebody tell me if minimizing SEE is better/worse than MSE in ADMM or ERM in general?