I have a question about the derivation of the SVM algorithm (for example, page 3 here ). The question is about the math, so that's why I'm asking this here.
Suppose I have the following optimization problem:
$$min_{w, \xi} \frac{1}{2} ||w||^2 +C \sum_i\xi_i,\\ s.t. \ y_i w x_i \geq 1 - \xi_i, \ i = 1, ... m, \\ \xi_i \geq 0 $$
In order to solve the problem, we use the Lagrangian:
$$ L(w, \xi, \alpha, \beta) = \frac{1}{2} ||w||^2 +C \sum_i\xi_i + \sum_i \alpha_i(1-\xi_i - y_i w x_i ) + \sum_i \beta_i (-\xi_i)$$
My question is: in this algorithm, the parameter $C$ is assumed to be non-negative. Why isn't this added to the constraints of the Lagrangian?
Thanks.