I am reading the elements of statistical learning book on SVM.
For the non separable case, the book states that the objective function is $\min \|\beta\|$ subject to $y_i(x_i^T \beta + \beta_0) \ge 1-\xi_i, \forall i$ with $\xi_i \ge 0, \sum \xi_i \le \text{Constant}$.
In the next page, the book states that this objective function can be re-expressed as $\min \frac{1}{2}\|\beta\|^2 + C \sum_{i=1}^N \xi_i$ subject to $\xi_i \ge 0, y_i(x_i^T \beta + \beta_0) \ge 1-\xi_i, \forall i$.
I understand the shift from $\|\beta\|$ to $\frac{1}{2} \|\beta\|^2$ but I do not know how to rigorously show that the constraint $\sum \xi_i \le \text{Constant}$ in the first formulation can be included as a constraint in the second formulation. Can anyone provide more details?