Solving L1 regularized Joint Least Squares and Logistic Regression

1.4k Views Asked by At

My objective function that is to be minimized is as follows:

$f = -\sum_{n=1}^{N}log~p(y_{n}^{a}|x_{n},w) + \sum_{n=1}^{N}(y_{n}^{b}-w^{T}x_{n})^{2} +\lambda\|w\|_1$

The first term models the relationship between data $x$ and labels $y^{a}$ using logistic regression, while the second term models data $x$ and labels $y^{b}$ using linear regression. The third term is to enforce sparsity/feature selection.

My question - I found a number of papers that either show how to solve L1 regularized logistic regression or linear regression. However, I could not find any method that could be applied to both kinds of problems. Do there exist techniques that can help me solve the above problem?

1

There are 1 best solutions below

5
On BEST ANSWER

First, a disclaimer: I'm not sure I see the statistical validity of combining both linear and logistic regression with the same measurement vectors $x_n$. I am going to assume you know what you are doing :-) and address the optimization question only.

Some quick and dirty approaches:

  • My Matlab toolbox CVX 2.1 can handle this, although with a caveat because it has to jump through some hoops to get the underlying solvers to accept the logistic regression term.
  • CVX 3.0 beta coupled with the SCS solver can solve this problem "natively", thus avoiding the aforementioned caveat; but this will be a bit more difficult to get up and running, and again, it's a beta!.
  • YALMIP can probably handle this well, too; and I believe it connects to SCS as well, which means it can also solve this problem natively.
  • CVXPY coupled with SCS can do this same thing in Python.
  • And you can implement your own proximal gradient solver if you are so inclined, though of course that's an advanced approach. You'd have to build a function to compute your own derivatives of the smooth portion of the objective.

Here is a logistic regression example for CVX, so you can see how to express the logistic term in a compliant manner using the CVX function log_sum_exp. It's a simple matter to modify this example to add the additional terms.

My recommendation is that you provide weighting values for both the linear regression and $\ell_1$ terms. That is, minimize something like this: $$f = -\sum_{n=1}^{N}\log~p(y_{n}^{a}|x_{n},w) + \lambda_1\sum_{n=1}^{N}(y_{n}^{b}-w^{T}x_{n})^{2} +\lambda_2\|w\|_1$$ You won't know what the best values of $\lambda_1$ and $\lambda_2$ are until you have done some cross validation. What I do know is that the chance that $\lambda_1=1$ is your best choice is slim to none.

The model in CVX is going to look something like this, and assumes that the data $y^a_n$, $y^b_n$ are stored in column vectors ya and yb, respectively, and the vectors the columns of the matrix X.

cvx_begin
    variable w(m)
    minimize(...
        -ya'*X*w+sum(log_sum_exp([zeros(1,m); w'*X'])) ... %logistic
        +lambda1*sum_square(yb-X*w) ... %linear
        +lambda2*norm(w,1)) %regularizer
cvx_end