I was learning about support vector machines from Andrew Ng video lectures. I figured it out. I understand why we try to minimize $\frac{1}{2} w^2$.
Margin (width) of the support vector is $2/\|w\|$. We want to maximize width it means that we also want to minimize $\|w\|$ from the equation $2/\|w\|$. It is true that we also want to minimize $\frac{1}{2}\|w\|^2$. Now we use Lagrange expression here, $$L = \frac{1}{2}\|w\|^2 + \sum_i α_i(y_i(w \bullet x+b)-1).$$ To find the minimum of $\frac{1}{2}\|w\|^2$, we apply gradient; $∇L = 0$. From the $∇L = 0$ equation, we get $Σ_iα_iy_i = 0$ and $w = Σ_iαi_iy_ix_i$. Then by using $Σ_iα_iy_i = 0$ and $w = Σ_iα_iy_ix_i$ equations and putting this in the Lagrange expression we end up with $$L(w,b,α) = \sum_iα_i-\frac{1}{2}\sum_{i,j}y_iy_jα_iα_j(x_i\bullet x_j).$$
I really understand up to here, but the professor said that we want to maximize $L$ (w.r.t $α$) and at the same time minimize it w.r.t $w$ and $b$ here, so the final Lagrangian primal problem becomes $$min_{w,b}max_α L(w, b, α)$$ $$s.t.$$ $$α_i >= 0, i=1,2 ..., m$$ why? I really don't understand. Why are we trying to maximize and then minimize the Lagrange expression ($L$)? Any intuitive explanation will help.
@Kenny Wong answer is very detailled.
I will try to add more explanation about the why there is a minimization and a maximization (which I believe is the problem you are struggling to understand) in the formula.
Given the function $$L(w, b, \alpha_i) = \frac{1}{2}\|w\|^2 - \sum_i α_i(y_i(w . x_i+b)-1)$$
The key here is to see that the problem:
$$\min_{w, b} \left( \max_{\alpha_i; \alpha_i \geq 0} L(w,b,\alpha_i)\right)$$
is equivalent to the problem: $$\min_{w, b} \tfrac 1 2 \| w\|^2$$ subject to $$y_i(w.x_i + b) \geq 1.$$
Indeed if we only consider $$\max_{\alpha_i; \alpha_i \geq 0} L(w,b,\alpha_i)$$ and select a $w$ and a $b$ such that $$ y_i (w.x_i + b) \lt 1$$ then we can easily make $L$ arbitrary large just by selecting the appropriate $\alpha_i$.
For instance imagine we select $w$ and a $b$ such that $$ y_i (w.x_i + b) = 0.5$$ $$L(w, b, \alpha_i) = \frac{1}{2}\|w\|^2 - \sum_i α_i(0.5-1)$$
$$L(w, b, \alpha_i) = \frac{1}{2}\|w\|^2 - \sum_i -0.5α_i$$
Now imagine that there are three $α_i$ and they all are equals to 1
Then in this case we would have $$L(w, b, \alpha_i) = \frac{1}{2}\|w\|^2 - (-1.5)$$ $$L(w, b, \alpha_i) = \frac{1}{2}\|w\|^2 + 1.5$$
If we say that $α_i$ are equals to 2 then we have
$$L(w, b, \alpha_i) = \frac{1}{2}\|w\|^2 + 3$$
and so on...
That is what the " $ + \infty\space\text{otherwise} $ " line means in
$$\max_{\alpha_i; \alpha_i \geq 0} L(w,b,\alpha_i) = \begin{cases} \frac 1 2 \| w \|^2 & {\rm if \ \ } y_i (w.x_i + b) \geq 1 \\ + \infty & {\rm otherwise}\end{cases}$$
So if we just maximize the lagrangian, we would not minimize $$\min_{w, b} \tfrac 1 2 \| w\|^2$$ subject to $$y_i(w.x_i + b) \geq 1.$$ which is our original goal.
Let us now consider the other case. If we select $w$ and a $b$ such that $$ y_i (w.x_i + b) = 3$$
Then we will have
$$L(w, b, \alpha_i) = \frac{1}{2}\|w\|^2 - \sum_i α_i(3-1)$$
$$L(w, b, \alpha_i) = \frac{1}{2}\|w\|^2 - \sum_i 2α_i$$
Because $\alpha_i; \alpha_i \geq 0$ then $$- \sum_i 2α_i$$ will always be zero or negative.
So the maximum of $L(w, b, \alpha_i)$ is always $\frac{1}{2}\|w\|^2$ when $y_i(w.x_i + b) \geq 1$.
Recall that we try to minimize $$\frac{1}{2}\|w\|^2$$ because we just saw that given the constraint $y_i(w.x_i + b) \geq 1$ we have $$\frac{1}{2}\|w\|^2 = \max_{\alpha_i; \alpha_i \geq 0} L(w,b,\alpha_i)$$ then what we need to do is $$\min_{w, b} \tfrac 1 2 \| w\|^2 = \min_{w, b} \left( \max_{\alpha_i; \alpha_i \geq 0} L(w,b,\alpha_i)\right)$$