I am trying to use lasso optimization on the data with 950 samples and about 5000 features. The lasso function is $(1 / (2 * numberofsamples)) * ||y - Xw||^2_2 + alpha * ||w||_1$.Once I try the minimization with an initialization, I get the totally different w which is odd because lasso is convex and initialization should not affect the result. Here is the result of the lasso with and without initialization. tol is the tolerance. If the change of the w became bellow tolerance, the convergence has happened.
tol=0.00000001
##### lasso model errors #####
gene: 5478 matrix error: 0.069611732213
with initialization: alpha: 1e-20 promotion: -3.58847815733e-13
coef: [-0.00214732 -0.00509795 0.00272167 -0.00651548 -0.00164646 -0.00115342
0.00553346 0.01047653 0.00139832]
without initialization: alpha: 1e-20 promotion: -19.0735249749
coef: [-0.03650629 0.08992003 -0.01287155 0.03203973 0.1567577 -0.03708655
-0.13710957 -0.01252736 -0.21710334]
with initialization: alpha: 1e-15 promotion: 1.06179081478e-10
coef: [-0.00214732 -0.00509795 0.00272167 -0.00651548 -0.00164646 -0.00115342
0.00553346 0.01047653 0.00139832]
without initialization: alpha: 1e-15 promotion: -19.0735249463
coef: [-0.03650629 0.08992003 -0.01287155 0.03203973 0.1567577 -0.03708655
-0.13710957 -0.01252736 -0.21710334]
Warning (from warnings module):
File "/usr/local/lib/python2.7/site-packages/sklearn/linear_model/coordinate_descent.py", line 491
ConvergenceWarning)
ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems.
with initialization: alpha: 1e-10 promotion: 0.775144987537
coef: [-0.00185139 -0.0048819 0.00218349 -0.00622618 -0.00145647 -0.00115857
0.0055919 0.01072924 0.00043773]
without initialization: alpha: 1e-10 promotion: -17.8649603301
coef: [-0.03581581 0.0892119 -0.01232829 0.03151441 0.15606195 -0.03734093
-0.13604286 -0.01247732 -0.21233529]
with initialization: alpha: 1e-08 promotion: -5.87121366314
coef: [-0. 0. -0. -0.01064477 0. -0.00116167
-0. 0.01114746 0. ]
without initialization: alpha: 1e-08 promotion: 4.05593555389
coef: [ 0. 0.04505117 0.00668611 0. 0.07731668 -0.03537848
-0.03151995 0. -0.00310122]
max promote:
4.05593555389
For the implementation, I used the lasso function of the python package sklearn.linear_model. I also change the data, but the results on the new data alter with initialization too. I think this is odd but I could not analyze it and find the explanation.