Mixed Integer Conditional Least Squares Regression

323 Views Asked by At

I'd like to minimize the conditional sum of squares

$$\sum f(\beta)^2$$

That is when

$$ f(\beta) = \begin{cases} \matrix{X} \beta - y & \text{when } \operatorname{sign}(X\beta) \neq \operatorname{sign}(y), \\ 0, & \text{otherwise}. \end{cases} $$

Basically this amounts to doing classic least squares regression except only penalizing errors in which the sign of the prediction is different from the sign of the actual value. Is this possible using mixed integer linear programming? If so how?

Edit $\beta$ is a variable, $X$ is a matrix (constant) and $y$ is a vector (constant)

1

There are 1 best solutions below

2
On

Yes, it can easily be cast as a mixed-integer quadratic program (assuming you know how to model implications with binary variables)

Introduce a binary $\delta_i$to represent if $y_i$ has same sign as $X_i \beta$, and variable $z_i$ to represent the cost for the residual.

You then add implications that $\delta_i$ implies $X_i \beta \geq 0$ if $y_i\geq 0$ or $X_i \beta \leq 0$ if $y_i\leq 0$. Similarily, you introduce the reveresed inequalities implied by $1-\delta_i$. For the cost, $\delta_i$ implies $z_i = 0$, while $1-\delta_i$ implies $z_i = X_i \beta-y_i$.

However, I think the model is extremely sensitive numerically. The sign-operator is nasty to work with in MILP/MIQP models, as an $\epsilon$-step theoretically could imply a huge change in the objective, but that small step could drown completely in the overall numerical tolerances in the solver, so instead it makes no difference. In this model, it means $\beta = 0$ is optimal in a solver with finite precision, as the obtained prediction, for the solver, has arbitrary sign.

Some experimentation below with the MATLAB Toolbox YALMIP (disclaimer, developed by me)

% Create data
N = 20;
x = linspace(-1,2,N)';
y = -1 + x + x.^2+x.^3 + 1*randn(length(x),1);
clf;
plot(x,y)

% Least squares quadratic model
X = [repmat(1,N,1) x x.^2];
beta = sdpvar(3,1);
e = y - X*beta;
optimize([],e'*e);
hold on;plot(x,value(X*beta));

% Binary indicator for same sign
d = binvar(N,1)

% Cost terms
z = sdpvar(N,1);
% What is considered to be non-zero?
tolerance = 1e-3; 
Predictions = X*beta;
neg = find(y<=0);
pos = find(y>=0);
Model = [implies(d, z == 0), implies(1-d, z == e)];
Model = [Model, implies(d(neg),   Predictions(neg) <= -tolerance)];
Model = [Model, implies(1-d(neg), Predictions(neg) >=  tolerance)];
Model = [Model, implies(d(pos),   Predictions(pos) >=  tolerance)];
Model = [Model, implies(1-d(pos), Predictions(pos) <= -tolerance)];
Model = [Model, -100 <= beta <= 100];
optimize(Model,z'*z);
plot(x,value(Predictions),'k')