SVM derivation - the "arbitrary multiplier" seems to matter.

57 Views Asked by At

I've been reading the derivation for SVMs in the book by Chris Bishop (pattern recognition and machine learning). Equations (7.7) describes the Lagrangian. Note the $\frac{1}{2}$ in front of the $w$, which was chosen arbitrarily.

Then, the derivatives with respect to $w$ and $b$ are set to zero producing equations (7.8) and (7.9).

\begin{align} L(w,b,a) = \frac{1}{2} ||w||^2 - \sum_{n=1}^N a_n (t_n (w^T \phi(x_n)+b)-1) \tag{7.7}\end{align}

Separating the terms,

\begin{align}L(w,b,a) = \frac{1}{2}||w||^2 -\sum_{n=1}^N a_nt_nw^T\phi(x_n) +b\sum_{n=1}^N a_nt_n-\sum_{n=1}^N a_n\tag{7.7a}\end{align}

\begin{align} w = \sum_{n=1}^N a_n t_n \phi(x_n) \tag{7.8}\end{align}

\begin{align} 0 = \sum_{n=1}^N a_n t_n \tag{7.9}\end{align}

Then, he substitutes equation (7.8) into (7.7)

Note that as a direct consequence of (7.8) we get:

$$||w||^2 = w^Tw = \sum_{n=1}^N \sum_{m=1}^N a_n a_m t_n t_m \phi(x_n)^T \phi(x_m) = \sum_{n=1}^N a_nt_n w^T\phi(x_n)\tag{7.8a}$$

Substituting into (7.7a), the first two terms yield: $-\frac{1}{2}\sum_{n=1}^N \sum_{m=1}^N a_n a_m t_n t_m \phi(x_n)^T \phi(x_m)$ and this reduces the Lagrangian to:

$$L(a) = \sum a_n -\frac{1}{2}\sum_{n=1}^N \sum_{m=1}^N a_n a_m t_n t_m \phi(x_n)^T \phi(x_m)$$

Herein lies my question. The only reason we were left with $-\frac{1}{2}$ was due to the arbitrary $\frac{1}{2}$ chosen to accompany $w$. If we chose 1 instead, the term would completely cancel out, fundamentally changing the Lagrangian.

1

There are 1 best solutions below

0
On BEST ANSWER

The answer occurred to me as I was writing the question. But since I had already put a lot of work into the question, I decided to leave it there and answer it for my own reference. The error in my thinking was assuming that (7.8) would remain unchanged if the multiplier accompanying $||w||^2$ (the objective function) was changed.