I'm studying support vector machines and in the process I've bumped into lagrange multipliers with multiple constraints and Karush–Kuhn–Tucker conditions.
I've been trying to study the subject, but still can't get a good enough grasp on the subject. In wikipedia:
(http://en.wikipedia.org/wiki/Lagrange_multiplier#Multiple_constraints)
it says that in order to find the extremum points of a function $f$, (with constraints $g_1, ..., g_m$), we must find a point $\text{x}$ such that
$$\sum_{i=1}^{m}\lambda_{i}\nabla g_i(\text{x}) = \nabla f(\text{x})$$
I understand lagrange multipliers when there is only one constraint, but this is hard to grasp for some reason... :(
Could anyone give me easy-to-understand explanation, why the equation above is true?
Thank you for any guidance :)
P.S.
If it is not a big job to do, I'd be very grateful If someone could also explain the Karush–Kuhn–Tucker conditions which generalize my question :) That would be super!
Consider a point $p$ in the common domain $\Omega\subset{\mathbb R}^n$ of $f$ and the constraints $$g_k(x)=0\qquad(1\leq k\leq r)\ .\tag{1}$$ The gradients $\nabla g_k(p)$ define a subspace $U$ of allowed directions when walking away from $p$. In fact a direction $X$ is allowed only if it belongs to the tangent planes of all level surfaces $(1)$. This means that $X$ is perpendicular to all $\nabla g_k(p)$, or is a solution of the homogeneous system of equations $$\nabla g_k(p)\cdot X=0\qquad(1\leq k\leq r)\ .\tag{2}$$
Now comes an important technical condition for the application of Lagrange's method: We have to assume that the $r$ gradients $\nabla g_k(p)$ are linearly independent, i.e. that $p$ is a regular point of the manifold defined by $(1)$. In this case the $\nabla g_k(p)$ span an $r$-dimensional subspace $V$, and the system $(2)$ has full rank. It follows that ${\rm dim}(U)= n-r$. Therefore we not only have $U\subset V^\perp$, but in fact $$U=V^\perp\ .$$
When $\nabla f(p)\cdot X\ne0$ for some allowed direction $X$ then the function $f$ is not conditionally stationary at $p$. For a constrained local extremum of $f$ at $p$ we therefore need $$\nabla f(p)\cdot X=0$$ for all directions $X\in U$, in other words: It is necessary that $$\nabla f(p)\in U^\perp=V\ .\tag{3}$$ When $\nabla f(p)\in V=\langle\nabla g_1(p),\ldots,\nabla g_r(p)\rangle$ then there are numbers $\lambda_k$ $\>(1\leq k\leq r)$ such that $$\nabla f(p)=\sum_{k=1}^r \lambda_k\>\nabla g_k(p)\ .\tag{4}$$ Solving $(4)$ (with $x$ in place of $p$) together with $(1)$ will bring all regular constrained extrema of $f$ to the fore.