Substitution of partial derivatives into Lagrangian primal function

139 Views Asked by At

I try to follow a proof from the book 'Elements of statistical learning' and don't quite understand how the author came up with the solution. The proof is about the optimization problem of the Support Vector classifier using the following lagrangian primal function:

$L_{P} = \frac{1}{2} ||w||^{2} + C \sum_{i}^{N} \xi_{i} - \sum_{i}^{N} \alpha_{i}[y_{i}(w^{T}x+b)-(1-\xi_{i})] - \sum_{i}^{N}\mu_{i}\xi_{i}$

Now the partial derivatives have to be considered and set to 0:

$\frac{\partial L_{p}}{\partial \theta} = \theta - \sum_{i}^{N} \alpha_{i} y_{i} x_{i} = 0 \Rightarrow \theta = \sum_{i}^{N} \alpha_{i} y_{i} x_{i}$

$\frac{\partial L_{p}}{\partial b} = - \sum_{i}^{N} \alpha_{i} y_{i} = 0 \Rightarrow \sum_{i}^{N} \alpha_{i} y_{i} = 0$

$\frac{\partial L_{p}}{\partial \xi_{i}} = C - \alpha_{i} - \mu_{i} = 0 \Rightarrow \alpha_{i} = C - \mu_{i}$

The author then claims to plug in the results of the partial derivatives into the above primal function and gets the following dual objective function:

$L_{D} = \sum_{I}^{N} \alpha_{I} - \frac{1}{2} \sum_{i}^N \sum_{j}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} x_{i}^{T} x_{j}$

My problem is that I don't understand where the result of $\frac{\partial L_{p}}{\partial \xi_{i}}$. The only way I can come up with the dual form is when I drop all terms that contain $\xi_{i}$ in $L_{P}$, but even though I come to the right result, I don't understand why this is the case.

1

There are 1 best solutions below

0
On BEST ANSWER

If you group all terms with $\xi_i$, you get $(C-\alpha_i-\mu_i)\xi_i$ in the dual objective, and you get $C-\alpha_i-\mu_i=0$ as a dual constraint. The term in the objective vanishes (and you can drop the constraint since you can always choose $\mu_i$ to satisfy it).