How does this optimization problem work?

45 Views Asked by At

$$ \begin{split} \max_{\beta, \epsilon, M} M\\ \mathrm{subject\ to}\\ \sum_{j=1}^p \beta_j^2 &= 1\\ y_i\left(\beta_0 + \sum_{k=1}^p \beta_k x_{ik}\right) &\ge M (1- \epsilon_i), \quad \epsilon_i \ge 0, \sum_{i=1}^n \epsilon_i \le C, \end{split} $$ where $C \ge 0$ is a tuning parameter. In this problem,

  • $x_i$ denotes the training sample (Input)
  • $y_i$ denotes the training observation(Output).
  • $y_i \in [-1,1] $

The above optimization problem is to choose the best separating hyperplane for a support vector classifier. But I have absolutely no clue how it works. can anyone explain this? Thanks.