Why should different initial points lead to different results for lasso optimization (which is convex)?

125 Views Asked by At

I am trying to use lasso optimization on the data with 950 samples and about 5000 features. The lasso function is $(1 / (2 * numberofsamples)) * ||y - Xw||^2_2 + alpha * ||w||_1$.Once I try the minimization with an initialization, I get the totally different w which is odd because lasso is convex and initialization should not affect the result. Here is the result of the lasso with and without initialization. tol is the tolerance. If the change of the w became bellow tolerance, the convergence has happened.

tol=0.00000001 
#####  lasso model errors  ##### 


gene: 5478 matrix error: 0.069611732213 
with initialization: alpha: 1e-20 promotion: -3.58847815733e-13 
coef: [-0.00214732 -0.00509795  0.00272167 -0.00651548 -0.00164646 -0.00115342 
  0.00553346  0.01047653  0.00139832] 
without initialization: alpha: 1e-20  promotion: -19.0735249749 
coef: [-0.03650629  0.08992003 -0.01287155  0.03203973  0.1567577  -0.03708655 
-0.13710957 -0.01252736 -0.21710334] 


with initialization: alpha: 1e-15 promotion: 1.06179081478e-10 
coef: [-0.00214732 -0.00509795  0.00272167 -0.00651548 -0.00164646 -0.00115342 
  0.00553346  0.01047653  0.00139832] 
without initialization: alpha: 1e-15  promotion: -19.0735249463 
coef: [-0.03650629  0.08992003 -0.01287155  0.03203973  0.1567577  -0.03708655 
-0.13710957 -0.01252736 -0.21710334] 



Warning (from warnings module): 
  File "/usr/local/lib/python2.7/site-packages/sklearn/linear_model/coordinate_descent.py", line 491 
    ConvergenceWarning) 
ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems. 
with initialization: alpha: 1e-10  promotion: 0.775144987537 
coef: [-0.00185139 -0.0048819   0.00218349 -0.00622618 -0.00145647 -0.00115857 
  0.0055919   0.01072924  0.00043773] 
without initialization: alpha: 1e-10 promotion: -17.8649603301 
coef: [-0.03581581  0.0892119  -0.01232829  0.03151441  0.15606195 -0.03734093 
-0.13604286 -0.01247732 -0.21233529] 


with initialization: alpha: 1e-08 promotion: -5.87121366314 
coef: [-0.          0.         -0.         -0.01064477  0.         -0.00116167 
-0.          0.01114746  0.        ] 
without initialization: alpha: 1e-08  promotion: 4.05593555389 
coef: [ 0.          0.04505117  0.00668611  0.          0.07731668 -0.03537848 
-0.03151995  0.         -0.00310122] 


max promote: 
4.05593555389 

For the implementation, I used the lasso function of the python package sklearn.linear_model. I also change the data, but the results on the new data alter with initialization too. I think this is odd but I could not analyze it and find the explanation.