SVM - Min square norm

1.7k Views Asked by At

All Support Vector Machine litterature mentions that optimal hyperplane is found as:

max 1/∥x∥ (st. constraints) which translates directly to:

min ∥x∥ or equivalently min $ ∥x∥^2 $.

Here (Minimizing square of a norm) it is explained why the previous statement holds true.

However, none of the sources on SVM explain why in practice we want to min $ ∥x∥^2 $ instead of $ ∥x∥ $.

Given that $ ∥x∥\ge0 $ by definition which is the practical reason for this transformation of the problem to a QP?

Example litterature on SVM:

  1. http://cs229.stanford.edu/notes/cs229-notes3.pdf
  2. http://alex.smola.org/papers/2003/SmoSch03b.pdf
1

There are 1 best solutions below

1
On BEST ANSWER

Short explanation: by squaring the norm we turn the problem into quadratic programming, for which many efficient solvers are available.

The reason quadratic programming is special is that the gradient of a quadratic function is linear. In particular, the gradient of $\|x\|^2$ is simply $2x$. In contrast, the gradient of $\|x\|$ is the nonlinear function $\frac{x}{\|x\|}$, which is discontinuous at the origin. Whether or not you actually use the gradient of a function when minimizing it, chances are that the continuity of gradient (or lack of it) will affect the rate of convergence of your method.