Why use two slack variables in the support vector regression formulation?

Question

Why use two slack variables in the support vector regression formulation?

2.2k Views Asked by Bumbble Comm At 27 Mar 2026 - 1:13

I am learning support vector regression but cannot fully understand the rational of the slack variable tricks in its formulation. The original optimization problem for SVR is as follows:

$\mathrm{min}\left\{C\sum_{i=1}^NL_\epsilon\left(y_i,w_0+\mathbf{w}^T\mathbf{x}_i\right)+\frac{1}{2}||\mathbf{w}||^2\right\}$

where $L_\epsilon\left(y_i,w_0+\mathbf{w}^T\mathbf{x}_i\right)=\mathrm{max}\left\{0,\big|y_i-\left(w_0+\mathbf{w}^T\mathbf{x}_i\right)\big|-\epsilon\right\}$ is the $\epsilon$-insensitive error function. Then all the papers and textbooks I read say to introduce two slack variables $\xi_i^+$ and $\xi_i^-$ such that the above problem transforms to:

$\mathrm{min}\left\{C\sum_{i=1}^N\left(\xi_i^++\xi_i^-\right)+\frac{1}{2}||\mathbf{w}||^2\right\}$ s.t. $\xi_i^+\geq0,\xi_i^-\geq0,\xi_i^++\epsilon\geq y_i-\left(w_0+\mathbf{w}^T\mathbf{x}_i\right)\geq-\xi_i^--\epsilon$

However, I just don't see the necessity to introduce two slack variables instead of one. In fact, if we simply let $\xi_i=L_\epsilon\left(y_i,w_0+\mathbf{w}^T\mathbf{x}_i\right)$, the original problem can be written as:

$\mathrm{min}\left\{C\sum_{i=1}^N\xi_i+\frac{1}{2}||\mathbf{w}||^2\right\}$ s.t. $\xi_i=\mathrm{max}\left\{0,\big|y_i-\left(w_0+\mathbf{w}^T\mathbf{x}_i\right)\big|-\epsilon\right\}$

The above problem is equivalent to

$\mathrm{min}\left\{C\sum_{i=1}^N\xi_i+\frac{1}{2}||\mathbf{w}||^2\right\}$ s.t. $\xi_i\geq0,\xi_i+\epsilon\geq y_i-\left(w_0+\mathbf{w}^T\mathbf{x}_i\right)\geq-\xi_i-\epsilon$

That is, we can just use one slack variable so as to write this in a standard quadratic programming form. Am I wrong? If not, why go all the way round to make two slack variables? Does it render any computational vantage or just for aid of interpretation of the concept?

Original Q&A

There are 2 best solutions below

Bumbble Comm On 18 Jul 2017 - 2:17

Axelle's answer explains how the two slack variables are different. We could replace the two slack variables by 1 by using the absolute value of the difference between the prediction and the target variable. This would make for a non-differentiable constraint function(which could be bothersome if one needs to derive the dual formulation or while deriving the KKT conditions) and this is why two different slack variables are introduced in the regression problem.

**Bumbble Comm** · Accepted Answer

I ran into the same question studying SVR, and even if this post is 2 years old maybe it can help others so here is an answer.

The slack variables in SVR are defined as such:

-> ξi+ is 0 if the training point is below the upper bound and positive if above

-> ξi- is 0 if the training point is above the lower bound and positive below

So you can see that the definitions are contradictory. If we used only 1 slack variable, say ξi+, if it was far below the lower bound, the value would still be 0. Look at the image below to convince yourself.

illustration of ξ+ and ξ-

Why use two slack variables in the support vector regression formulation?

There are 2 best solutions below

Related Questions in CONVEX-OPTIMIZATION

Related Questions in MACHINE-LEARNING

Related Questions in PATTERN-RECOGNITION

Related Questions in QUADRATIC-PROGRAMMING

Trending Questions

Popular # Hahtags

Popular Questions