Find the max and/or min of $f=x+2y$ on $x^2+y^2=1$
So, I'd like to do it with Lagrange-multipliers. I'd like to demonstrate two approaches for that.
Approach 1:
We first write $g=x^2+y^2-1$
Lagrange-Function: $L=f-\lambda g = x + 2y - \lambda(x^2 + y^2 -1)$
We solve:
$\frac{\partial L}{\partial x}=1-2\lambda x = 0 \quad \Rightarrow \quad \lambda=\frac{1}{2x}$
$\frac{\partial L}{\partial y}=2-2\lambda y = 0 \quad \Rightarrow \quad \lambda=\frac{1}{y}$
Using the results of the above we get: $y=2x$
By using $x^2+y^2=1$ we get $x^2+4x^2-1=5x^2-1=0 \quad\Rightarrow\quad x=\pm\frac{1}{\sqrt{5}}$ and using $y=2x$ we get $y=\pm \frac{2}{\sqrt{5}}$
So we found the two candidates: $p_1=\big( \frac{1}{\sqrt{5}}, \frac{2}{\sqrt{5}} \big), p_2=\big( \frac{-1}{\sqrt{5}}, \frac{-2}{\sqrt{5}}\big)$
Using $\operatorname{Hess}(L)(x,y,\lambda)=\begin{pmatrix}-2\lambda & 0 \\ 0 & -2\lambda\end{pmatrix}$
By evaluating the hessian matrix in the two points ,we find that $p_1$ is a maximum and $p_2$ is a minimum.
Approach 2: Again we have the same $f,g$ as above.
$\nabla f = (1, 2), \quad \nabla g=( 2x, 2y)$
We get:
I) $1=\lambda 2x \quad\Rightarrow\quad x=\frac{1}{2\lambda}$ II) $2=\lambda 2y \quad\Rightarrow\quad y=\frac{1}{\lambda}$ III) $x^2+y^2 = 1$
From I and II we also get $ x=2y$
We solve the system of equations for $\lambda$, getting: $\lambda=\pm\frac{\sqrt{5}}{2}$
Using the results from I and II we get $p_1=\big( \frac{1}{\sqrt{5}}, \frac{2}{\sqrt{5}} \big), p_2=\big( \frac{-1}{\sqrt{5}}, \frac{-2}{\sqrt{5}}\big)$
We evaluate $f(p_1)=\sqrt{5}$ and $f(p_2)=-\sqrt{5}$ so $p_1$ is maximum and $p_2$ is minimum.
Question: Is there any significant difference here? I really think the second one is way more intuitive. I really think actualy using the gradient just makes more sense. I think in the first one we kind of create a new vectorfield wich depends on the lagrangian-variable and then just apply the extremal problem on that. But I'm not 100% sure about the first approach.
This really is a question about details, so I know that the basic concept of both approaches is the same. But I'm e.g. wondering why, in the first approach, they didn't just put the value into the function.
If going with the 2nd approach, how would I know if a point is a saddle point? Which one is the more general approach?
In the first approach, if there is more than one local maximum then how do you consider?
The second approach is more efficient than the first approach.