Understanding the constraint of SVM optimization problem

294 Views Asked by At

I'm learning SVM (support vector machines) from this book. I understand formulations of functional and geometric margins, it's also clear that we want to maximize geometric margin in order to find the optimal hyperplane, which separates the data points optimally.

What I don't understand is the constraint of the optimization problem. Following problem is given in the mentioned book:

$$\max_{w, b} M$$ $$s.t. \gamma_{i} >= M, i = 1,2...m$$

Where $M$ is the geometric margin of the hyperplane and $\gamma_i$ is geometric margin of a single data point $(x_i, y_i)$. Which is given by:

$$\gamma_i = y_i \left(\frac{w \cdot x_i}{||w||} + \frac{b}{||w||}\right) $$

Why we need this constraint? Couldn't we solve the problem without this constraint? I don't find an intuitive reason behind the constraint.

1

There are 1 best solutions below

2
On

I must say that I am more familiar with a different constraint, which basically gives similar results. First note that, in this version, you have a hard margin SVM which means that there will be no misclassification nor uncertainty. Thus every point must not be inside the margin nor on the "wrong side" of the margin.

The constraint itself, serves to obtain a solution as much general as it can be, so that there the error on the test data will be as small as possible, i.e. the larger the margin the higher the chance of getting less misclassification.

Hope it helped. Feel free to discuss this much further.