Why do we minimize the squared norm instead of the norm in this optimization problem?

2.5k Views Asked by At

When reading about the optimization problem for Support Vector Machine in Bishop's book (Pattern Recognition and Machine Learning) he wrote that:

The optimization problem then simply requires that we maximize $\| \mathbf w\|^{-1}$ which is equivalent to minimizing $\| \mathbf w\|^{2}$

Here $\| \mathbf w\|$ is the euclidian norm of the vector $\mathbf w$ so it is always a positive number.

I am wondering why do we not minimize $\| \mathbf w\|$ instead of $\| \mathbf w\|^2$ ?

Does using $\| \mathbf w\|^{2}$ simplifies the optimization problem?

At first I thought that taking the square made the problem convex, but the function $$f(x,y)=\sqrt{x^2+y^2}$$ looks convex to me too.

1

There are 1 best solutions below

1
On BEST ANSWER

Minimizing $|w|$ is the same as minimizing $|w|^2$. As I understand it, the reason we choose the latter is purely for convenience. Would you rather deal with partials of $\sqrt{x^2+y^2}$ or $x^2+y^2$? The latter, obviously, because it's easier.