How to do regularization in linear programming

693 Views Asked by At

For quadratic programming, the trick can be implementing an constant. Example:

$$H = A^T Q A$$

$$Min: \frac{1}{2}x^THx + c^T x$$

Where $Q = \alpha I$

This gives more smooth optimal values. Just set $\alpha$ to a value like 0.85 and everything will be fine.

But how would I do that for linear programming?

$$Max: c^Tx$$ S.t $$Ax \le b \\ x \ge 0$$

I know that for ordinary least squares:

$$x = (A^TA)^{-1}A^Tb$$

Then we can add this feature and it will give us a regularized solution.

$$x = (A^TA + \alpha I)^{-1}A^Tb$$

But how should I write my objective function so it has the feature as well?

2

There are 2 best solutions below

2
On BEST ANSWER

Here is the answer!

Let's say that you have the system $Ax = b$. OK! Then you want to find $x$ with regularization.

Use ordinary least square:

$$x = (A^TA + \alpha I)^{-1}A^Tb$$

If you want the same result, but using linear programming.

$$max : c^T x$$ With S.t to: $$Ax \le b \\ x \ge 0$$

Then you need to replace the constraints and the objective function to:

$$max : (A^TA + \alpha I)^Tb x$$ With S.t to: $$(A^TA + \alpha I)x \le A^Tb \\ x \ge 0$$

GNU Octave example:

>> A = [3 1; 4 6];
>> b = [3; 6];
>> a = 0.01; % Alpha is small
>> x = inv(A'*A + a*eye(2))*A'*b % Ordinary least square
x =

   0.85612
   0.42920

>> x = glpk((A'*A + a*eye(2))'*b, A'*A + a*eye(2), A'*b, [], [], repmat("U", 1, 2), repmat("C", 1, 2), -1) % LP
x =

   0.85612
   0.42920

>>
9
On

A typical linear least squares problem can be

$${\bf x_o} = \min_{\bf x}\left\{\|{\bf Ax-b}\|_2^2\right\}$$ with added Tikhonov regularization:

$${\bf x_o} = \min_{\bf x}\left\{\|{\bf Ax-b}\|_2^2 + \lambda \|{\bf I x}\|_2^2\right\}$$

which after expansion and differentiation setting $= 0$ vector et.c. reduces to

$${\bf x_o} = ({\bf A}^T{\bf A}+\lambda {\bf I})^{-1}({\bf A}^T{\bf b})$$

so the regularization is added to the ${\bf A}^T{\bf A}$ matrix in the left hand side, not multiplied somewhere in the middle of it.

so this $\bf A$ we have is in fact ${\bf Q}^{1/2}{\bf A}$ above so it will expand to:

$${\bf x_o} = ({\bf A}^T{\bf Q}^{T/2}{\bf Q}^{1/2}{\bf A}+\lambda {\bf I})^{-1}({\bf A}^T{\bf b})$$

Now assuming ${\bf Q}^{T/2}{\bf Q}^{1/2} = {\bf Q}$, well otherwise ${\bf Q}^{1/2}$ above would not even be defined.

$${\bf x_o} = ({\bf A}^T{\bf QA}+\lambda {\bf I})^{-1}({\bf A}^T{\bf b})$$