How do you minimize "hinge-loss"?

19.8k Views Asked by At

A lot of material on the web regarding Loss functions talk about "minimizing the Hinge Loss".

However, nobody actually explains it, or at least gives some example. The best material I found is here from Columbia, and I include some snippets from it below.

I understand the hinge loss to be an extension of the 0-1 loss. The 0-1 Loss Function gives us a value of 0 or 1 depending on if the current hypothesis being tested gave us the correct answer for a particular item in the training set. The hinge loss does the same but instead of giving us 0 or 1, it gives us a value that increases the further off the point is.

This formula goes over all the points in our training set, and calculates the Hinge Loss $w$ and $b$ causes. It sums up all the losses and divides it by the number of points we fed it.

Hinge Loss Formula

where Hinge Loss l

This much makes sense to me.

What's confusing me is as follows:

  • How do you plot a hinge loss function?
  • How do you minimize it? Isn't the minimal always zero?
  • How should I understand the typical hinge loss graph? Are they just gross oversimplifications? For example, the green line represents the hinge loss function you see in every image in a Google search for "hinge loss".

Hinge Loss Graph

**Please, **Can someone provide (for the world) a simple example of hinge loss minimization? Let's say I have four negative points (blue circles) and four positive points (red squares). What would the loss function look like? How do I minimize (mathematically, and with intuition).

Knowing this would be a huuuge help for me, and probably for many others, as the resources on this popular topic are scarce. Thanks!

Points on Plot

1

There are 1 best solutions below

3
On

The hinge function is convex and the problem of its minimization can be cast as a quadratic program:

$\min \frac{1}{m}\sum t_i + ||w||^2 $

$\quad t_i \geq 1 - y_i(wx_i + b) , \forall i=1,\ldots,m$

$\quad t_i \geq 0 , \forall i=1,\ldots,m$

or in conic form

$\min \frac{1}{m}\sum t_i + z $

$\quad t_i \geq y_i(wx_i + b) , \forall i=1,\ldots,m$

$\quad t_i \geq 0 , \forall i=1,\ldots,m$

$ (2,z,w) \in Q_r^m$

where $Q_r$ is the rotated cone.

You can solve this problem either using a QP /SOCP solver, as MOSEK, or by some specialize algorithms that you can find in literature. Note that the minimum is not necessarily in zero, because of the interplay between the norm of $w$ and the blue side of the objective function.

In the second picture, every feasible solution is a line. An optimal one will balance the separation of the two classes, given by the blue term and the norm.

As for references, searching on Scholar Google seems quite ok. But even following the references from Wikipedia it is already a first steps.

You can draw some inspiration from one of my recent blog post:

http://blog.mosek.com/2014/03/swissquant-math-challenge-on-lasso.html

it is a similar problem. Also, for more general concept on conic optimization and regression you can check the classical book of Boyd.