All Support Vector Machine litterature mentions that optimal hyperplane is found as:
max 1/∥x∥ (st. constraints) which translates directly to:
min ∥x∥ or equivalently min $ ∥x∥^2 $.
Here (Minimizing square of a norm) it is explained why the previous statement holds true.
However, none of the sources on SVM explain why in practice we want to min $ ∥x∥^2 $ instead of $ ∥x∥ $.
Given that $ ∥x∥\ge0 $ by definition which is the practical reason for this transformation of the problem to a QP?
Example litterature on SVM:
Short explanation: by squaring the norm we turn the problem into quadratic programming, for which many efficient solvers are available.
The reason quadratic programming is special is that the gradient of a quadratic function is linear. In particular, the gradient of $\|x\|^2$ is simply $2x$. In contrast, the gradient of $\|x\|$ is the nonlinear function $\frac{x}{\|x\|}$, which is discontinuous at the origin. Whether or not you actually use the gradient of a function when minimizing it, chances are that the continuity of gradient (or lack of it) will affect the rate of convergence of your method.