How to calculate the Langrangian of the hard margin SVM primal problem?

925 Views Asked by At

I need to compute the Lagrangian of the primal problem for hard margin SVMs by hand. This is an assignment for university!

I have vectors $$x_0 = (0, 0), x_1=(1, 2), x_2 = (-1, 2)$$ and $$y_1 = -1, y_2 = 1, y_3 = 1 $$ So I need to find a hyperplane that can divide the two classes ($-1, 1$) with a hard margin.

Now for this I need to solve the primal optimization problem which is:

$$ \text{min}_{w \in \mathbb{R^d}, b \in \mathbb{R}} \text{ } \frac{1}{2} ||w||^2 \\ \text{s.t.: } Y_i(\langle w, X_i \rangle + b) \geq 1\\ \text{or s.t.: } Y_i(\langle w, X_i \rangle + b) - 1 \geq 0 $$ With using the Lagrangian I get the following: $$ L(\omega, b, \alpha) = \frac{1}{2}||w||^2 - \sum^n_{i = 1} \alpha_i (Y_i(\langle w, X_i \rangle + b) - 1) $$ So I need to get all the derivatives of the Lagrangian , where I get my first problems. How do I derive for $\alpha$? $$ \frac{\partial L}{\partial w} : w - \sum^n_{i = 1} \alpha_i \cdot Y_i \cdot X_i = 0\\ \frac{\partial L}{\partial b} : \sum^n_{i = 1} \alpha_i \cdot Y_i = 0\\ \frac{\partial L}{\partial \alpha} : ?\\ $$

Now after I have all derivatives I will have a linear equation system with $w, b, \alpha$ being the unknowns. I even have problems to get to the linear equation system because I don't have much of a background on matrices in my study "career".

So my question is, what is the derivative of $\frac{\partial L}{\partial \alpha}$ (A clue would be good enough) and how do I create the equation system to solve this problem?

2

There are 2 best solutions below

3
On BEST ANSWER

Regarding the Lagrangian

$$ L(\omega, b, \alpha) = \frac{1}{2}||\omega||^2 - \sum^n_{i = 1} \alpha_i (\langle \omega, X_i \rangle + b - y_i) $$

we have

$$ \partial_{\omega} L = \omega-\sum_i \alpha_iX_i = 0\\ \partial_{\alpha_i}L = \langle \omega, X_i\rangle +b - y_i = 0\\ \partial_b L = \sum_i\alpha_i = 0 $$

In our case it gives the separation plane $y-1=0$

enter image description here

NOTE

The system to solve is

$$ \left\{ \begin{array}{rcl} \alpha_2-\alpha_3+\omega_1&=&0 \\ 2 \alpha_2+2 \alpha_2+\omega_2&=&0 \\ b+1&=&0 \\ b+\omega_1+2 \omega_2-1&=&0 \\ b-\omega_1+2 \omega_2-1&=&0 \\ \alpha_1+\alpha_2+\alpha_3&=&0 \\ \end{array} \right. $$

giving

$$ \omega=(0,1),\ \ b=-1,\ \ \alpha =\left(\frac 12,-\frac 14, -\frac 14\right) $$

0
On

Derive your Lagrangian for the optimization variables $w$ and $b$ and for the alphas, where there are as many as samples. The key problem, I guess, is ensuring that you did the derivations right. The previous answer used a wrong Lagrangian and thus a wrong system of linear equations, where not all alphas are non-negative (inconsistent with KKT conditions). Still, I get the same separating hyperplane.

My solution for deriving $L$ is:

\begin{equation} L(w,b,\alpha) = \dfrac{1}{2} ||w||^2 - \sum_{i=1}^{n} \alpha_i(y_i(\langle w, X_i \rangle + b) -1) \end{equation}

\begin{align} \partial_w L(w,b,\alpha) = w - \sum_i \alpha_i y_i X_i \\ \partial_b L(w,b,\alpha) = -\sum_i \alpha_i y_i \\ \partial_{\alpha_i} L(w,b,\alpha) = -(y_i(\langle w, X_i \rangle + b) -1 ) \end{align}

Set the derivations to $0$, insert your data points, decompose $w$ into its components and you end up with the following set of linear equations:

\begin{align} w_1 - \alpha_2 + \alpha_3 = 0\\ w_2 - 2\alpha_2 -2 \alpha_3 = 0\\ -\alpha_1+ \alpha_2 + \alpha_3 = 0\\ -b -1 = 0\\ w_1 + 2w_2 + b -1 = 0\\ -w_1+2w_2+b-1 = 0\\ \end{align}

Solving them, yields the following solution with non-negative $\alpha_i$ (consistent with the Karush-Kuhn-Tucker theorem, where $\alpha_i \geq 0$): \begin{equation} w_1^* = 0, w_2^* = 1, b^*=-1, \alpha_1^* = \dfrac{1}{2}, \alpha_2^* = \dfrac{1}{4}, \alpha_3^* = \dfrac{1}{4}. \end{equation}

The resulting decision function is then given by: \begin{equation} f(X) = \langle w^*, X \rangle +b^* . \end{equation} The decision boundary is, where the decision function is set to $0$ and thus your decision boundary in 2d is given by: \begin{align} w_1^* x_1 + w_2^* x_2 + b^* = 0 \\ \Leftrightarrow x_2 = 1 \end{align}