Algorithms For Lp Regression

506 Views Asked by At

So I know that L2 regression problems can be solved by simple autocorrelations and matrix inversions.

Similarly, L1 and L$_{\infty}$ problems can be solved by linear programs.

But what about Lp, for $p \in (1, 2)$? How can I pose these problems in a way that I can solve numerically? What algorithms can be used?

Edit For Clarity: I am looking to develop my own software to solve problems of the form:

$$\underset{\mathbf{\beta}}{\text{minimize}} ||\mathbf{y} - X\mathbf{\beta}||_{p}, p\in (1,2)$$

3

There are 3 best solutions below

2
On

If you are only interested in the answer, I would recommend you to download CVX (if you are using Matlab) or cvxpy if you are using python. They are very general solvers for convex problems and are super great! If you are interested in solving these problems fast and/or if your problems are very big, you should probably opt for a custom made solver. Since you interested in solving problems with $p\in(1,2)$ you might find this interesting (I guess your problem is similar to equation (27) and (33)?). You can also find the code on one of the authors home page.

0
On

Your problem is a convex problem.
Moreover, pay attention that:

$$ \arg \min_{\beta} {\left\| X \beta - y \right\|}_{p} = \arg \min_{\beta} {\left\| X \beta - y \right\|}_{p}^{p} $$

Now the above can be solved using many methods for Convex Problems.
The most straight forward would be Sub Gradient Method.

With some other tricks you can even transform this into smooth convex problem in a similar approach to what I did here - Convert Convex $ {\ell}_{1.5} $ Regression Problem into Semi Definite Program (SDP).

1
On

As one existing answer has pointed out, minimizing $\|y-X\beta\|_p$ is equivalent to minimizing $\|y-X\beta\|_p^p$, which is a convex and differentiable objective for $1<p<2$.

Such questions are well-studied in machine learning literature. Because of the complexity of fractional $p$, I believe that noisy first-order methods are particularly relevant, which only requires gradients of the objective. You may wish to try stochastic gradient descent (SGD), stochastic variance reduction gradient descent (SVRG) or SGD with Nesterov's momentum, to name a few.