Maybe it is already answered here but there are way too many suggestions popping up including lagrange..
My question is rather basic: Why does introducing a lagrange multiplicater transform the initial solution of the optimization task into a local optima of the lagrange function? Or better described: Why is the solution given by the corresponding optima?
$\nabla f$ is the direction of greatest local increase in the function $f$. Moving perpendicular to $\nabla f$ causes no local change in the function. Thus, a local optimum on the constraint surface must occur when every direction you can move on the surface is perpendicular to $\nabla f$.
If the equation of the constraint surface is given in the form $g(\mathbf{x}) = c$, then $\nabla g$ is everywhere perpendicular to the constraint surface. We see this because, as you move along the constraint surface, $g$ doesn't change, so every direction you can move on the surface must always be in a direction of no local change in $g$ (i.e., perpendicular to $\nabla g$).
So, since $\nabla g$ is perpendicular to the constraint surface everywhere and $\nabla f$ is perpendicular to the constraint surface at local optima, local optima must occur at places where $\nabla g$ is parallel to $\nabla f$. This means there is some real constant $\lambda$ such that $\nabla f = \lambda \nabla g$ at each local optimum, or equivalently, that $$ \nabla (f - \lambda g) = 0. $$
For multiple constraints of the form $g_i(\mathbf{x}) = c_i$, the constraint surface is perpendicular to each gradient $\nabla g_i$ at all of its points, and $\nabla f$ is perpendicular to the constraint surface whenever it is some linear combination of the $\nabla g_i$. Thus, in the multiple constraint case, we have $$ \nabla \left(f - \sum_i \lambda_i g_i\right) = 0. $$