I'm trying to figure out where I'm going wrong with my formulation of a certain problem, as all other instances of it were formulated slightly differently.
The problem (SVR problem, If you're familiar with it) is as follows:given $\{x^{(i)},y_i\}_{i=1}^{m}$, We would like to find a hyperplane, defined by $\langle w,x\rangle +b=0$ which minimizes the $l_2$ norm of $w$ plus the sum (over $i$) of the loss defined by: $$loss(w,b;x^{(i)})=[\left|\langle w,x^{(i)}\rangle +b -y_i\right|-\epsilon ]_+ $$
I would formulate the problem as follows: $$\min_{w,b,\xi_i} \frac{1}{2}\|w\|^2 + \sum_{i=1}^{m}\xi_i$$ $$s.t.\cases{\langle w,x^{(i)}\rangle +b-y_i-\epsilon\leq\xi_i\cr y_i-\langle w,x^{(i)}\rangle -b-\epsilon\leq\xi_i\cr \xi_i\geq 0} i=1,...,m $$ But every formulation I've seen uses separate variables $\xi_i,\xi_i^* \geq 0$ for the first 2 constraints: $$\min_{w,b,\xi_i} \frac{1}{2}\|w\|^2 + \sum_{i=1}^{m}(\xi_i + \xi_i^*)$$ $$s.t. \cases{\langle w,x^{(i)}\rangle +b-y_i-\epsilon\leq\xi_i\cr y_i-\langle w,x^{(i)}\rangle -b-\epsilon\leq\xi_i^*\cr \xi_i,\xi_i^*\geq 0} i=1,...,m $$ What's wrong with my way? I'm thinking it might have something to do with the KKT conditions (since we want to formulate it using the multipliers and get a problem with a global minimum). I have some basic understanding of the KKT conditions, but I don't fully understand all the constraints qualifications (Which I assume might be the problem)
The dual of your new primal problem is almost the same as the standard dual, with one difference. To see this, I am following the work on this link (but modifying it to your new primal): http://alex.smola.org/papers/2003/SmoSch03b.pdf
Under your new problem, the Lagrangian is (with multipliers $\lambda_i, \mu_i, \theta_i$):
\begin{align} &L = \frac{1}{2}||w||^2 + \sum_i \xi_i \\ &+ \sum_i \lambda_i [w^T x^i + b - y_i - \epsilon - \xi_i] \\ &+ \sum_i \mu_i [y_i - w^T x^i - b - \epsilon - \xi_i] \\ &-\sum_i \theta_i \xi_i \end{align}
where $w^T$ is the transpose of vector $w$. You now use duality concepts to compute $\sup_{\lambda \geq 0 , \mu \geq 0, \theta \geq 0} \left[\inf_{w, \xi, b} L\right]$. Since we must have an infimum over $w$ we can take $\partial/\partial w_j$ to get (similar to equation (8) in the paper):
$$ w = \sum_i (\mu_i - \lambda_i)x^i $$
Since we must take an infimum over $b$, the answer is $-\infty$ unless (similar to equation (7) in the paper):
$$ \sum_i (\lambda_i - \mu_i) = 0 $$
and so we must choose the multipliers to satisfy the above to avoid a $-\infty$ infimum. Similarly, since we take an infimum over $\xi_i$ we must have (similar to equation (9)):
$$ 1 = \lambda_i + \mu_i + \theta_i $$
Plugging these back into the Lagrangian $L$, simplifying, and taking the outer maximization gives:
\begin{align} &\mbox{Maximize:} -\frac{1}{2}\sum_i\sum_j (\mu_i-\lambda_i)(\mu_j-\lambda_j)x_i^Tx_j - \epsilon\sum_i(\lambda_i + \mu_i) + \sum_i y^i (\mu_i-\lambda_i) \\ &\mbox{Subject to:} \sum_i (\lambda_i-\mu_i) = 0 \: \: , \: \: \lambda_i \geq 0 \: \: \forall i \: \: , \: \: \mu_i \geq 0 \: \: \forall i \: \: , \: \: \lambda_i + \mu_i \leq 1 \: \: \forall i \end{align}
where the $\lambda_i + \mu_i \leq 1 $ constraint comes from $\lambda_i + \mu_i = 1 - \theta_i$ and $\theta_i\geq 0$. The only difference between this new dual problem and the standard one given in equation (10) of the paper is that, there, we would get separate constraints $\lambda_i \in [0,1]$ and $\mu_i \in [0,1]$ rather than the combined constraint $\lambda_i + \mu_i \leq 1$. I think the separate constraints are a bit easier, and so your formulation has a simpler primal (as Michael Grant also mentioned), but a slightly more complicated dual.