If we want to
minimize $f(x, y)$
subject to condition $g(x, y) = c$.
Then we treat $\lambda$ as a new variable and consider minimizing
$$h(x,y,\lambda)=f(x,y)+\lambda (g(x,y)-c)$$
Question: can we consider minimizing $u(x,y,\lambda)$?
$$u(x,y,\lambda)=f(x,y)+\lambda (g(x,y)-c)^2$$
The $\lambda$ term in $u(x,y,\lambda)$ now behaves like a potential well.
EDIT: Many thanks to Michael for spending the time to answer my question in such extraordinary details.
I actually intend to ask the following question:
minimize $f(x, y)$
subject to condition $g(x, y) = c$.
The Lagrange multiplier method requires us to consider the stationary points for $h(x,y,\lambda)$:
$$h(x,y,\lambda)=f(x,y)+\lambda (g(x,y)-c)$$
Question: can we consider the stationary points for $u(x,y,\lambda)$?
$$u(x,y,\lambda)=f(x,y)+\lambda (g(x,y)-c)^m \text { }(m=1,2,3,4,\cdots)$$
Michael's example showed that when $m=2$, we may not always have reasonable solutions. What about when $m$ is odd,like 3,5,...?
Your description of Lagrange multipliers, and of the alternative quadratic penalty method, is a bit off. Rather than jointly minimizing over $(x,y,\lambda) \in \mathbb{R}^3$, we typically want to fix $\lambda\in\mathbb{R}$ and then minimize over $(x,y) \in \mathbb{R}^2$. The solution will be parameterized by $\lambda$, so we then appropriately choose $\lambda$ to match (or approach) the desired constraint (if possible). There are conditions when this will work (details below).
Joint minimization over $(x,y,\lambda)$ is incorrect.
1) For your penalty method, minimizing $f(x,y) + \lambda(g(x,y)-c)^2$ over $(x,y,\lambda) \in \mathbb{R}^3$ leads to choosing $\lambda = 0$, and then choosing $(x,y)\in\mathbb{R}^2$ to minimize $f(x,y)$, with no regards to the desired constraint. So, this approach is wrong.
2) For Lagrange multipliers, minimizing $f(x,y) + \lambda (g(x,y)-c)$ over $(x,y,\lambda) \in \mathbb{R}^3$ is wrong. Consider $f(x,y) = x^2 + y^2$, $g(x,y) = x+y$, $c=1$. The problem of minimizing $x^2+y^2$ subject to $x+y=1$ has optimal solution $(x^*,y^*)=(1/2,1/2)$, which can be found from basic calculus. Let's see what happens with Lagrange multipliers:
Fix $\lambda$ and minimize $x^2 + y^2 + \lambda(x+y-1)$ over $(x,y)\in \mathbb{R}^2$. This leads to $x = y = -\lambda/2$. Minimizing $(-\lambda/2)^2 + (-\lambda/2)^2 + \lambda(-\lambda -1)$ over all $\lambda \in \mathbb{R}$ gives an incorrect result of $\lambda \rightarrow\infty$ and $x=y\rightarrow -\infty$. Maximizing over all $\lambda \in \mathbb{R}$ works in this particular example (due to convexity properties of the problem). Also, choosing $\lambda \in \mathbb{R}$ to satisfy the constraint $x+y=1$ gives $\lambda = -1$ and leads to the correct answer $(x,y)=(1/2,1/2)$.
A correct quadratic penalty function approach.
A correct way to implement the quadratic penalty method is to fix a large (positive) value of $\lambda$, and then choose $(x,y) \in \mathbb{R}^2$ to minimize $f(x,y) + \lambda (g(x,y)-c)^2$. The intuition is that a large $\lambda$ will heavily penalize deviation from the desired constraint $g(x,y)=c$. This is related to standard "exact penalty function" methods (just websearch that phrase). Variations on this theme are known to either work exactly and/or to converge to the optimal solution when $\lambda \rightarrow \infty$ (under certain assumptions).
Example 1:
Let's revisit the previous example with $f(x,y)=x^2+y^2$, $g(x,y)=x+y$, $c=1$, and use this quadratic penalty approach.
Fix $\lambda >0$. Minimizing $x^2+y^2 + \lambda (x+y-c)^2$ leads to $x=y=\frac{\lambda}{1+2\lambda}$. So, there is no positive value of $\lambda$ for which this method works exactly. However, as $\lambda\rightarrow \infty$ we see that the solution converges to the optimal $(x,y)\rightarrow(1/2,1/2)$. So, the intuition "works" in this particular example as $\lambda \rightarrow\infty$.
Example 2:
This example shows the quadratic penalty method does not always work. Let $f(x,y) = -(x-1)^{10}$, $g(x,y)=x^2$, $c=1$. The minimum of $-(x-1)^{10}$ subject to $x^2=1$ is $x^*=-1$. However, minimizing $-(x-1)^{10} + \lambda (x^2-1)^2$ over $x \in \mathbb{R}$ gives $x\rightarrow\infty$ for any value of $\lambda$. This gives an incorrect value of $x$ for all $\lambda$, and is also incorrect in the limit as $\lambda \rightarrow\infty$.
Proof of optimality under compactness assumptions.
Here is a proof that your quadratic penalty function will converge to an optimal solution under certain assumptions. Let $A$ be a closed and bounded subset of $\mathbb{R}^N$. Fix $c \in \mathbb{R}$. Suppose:
1) $f(x)$ and $g(x)$ are continuous functions over $x \in A$, where $x=(x_1, \ldots, x_N)$.
2) We want to minimize $f(x)$ such that $x\in A$ and $g(x)=c$. Suppose there exists a solution. Let $f^*$ be the optimal value of the objective function.
3) For each $\lambda >0$, define $x(\lambda)$ as a particular minimizer of $f(x) + \lambda (g(x)-c)^2$ over $x \in A$.
Claim:
$\lim_{\lambda \rightarrow\infty} \left[f(x(\lambda)) + \lambda (g(x(\lambda)) - c)^2\right] = f^*$.
Further, for any sequence of positive values $\lambda_1< \lambda_2 < \lambda_3 < \ldots$ that satisfy $\lambda_k\rightarrow\infty$, there exists a subsequence $\lambda_{k_n}$ that satisfies $x(\lambda_{k_n})\rightarrow z^*$ for some value $z^*\in A$ that satisfies $f(z^*)=f^*$ and $g(z^*)=c$. Thus, $x(\lambda_{k_n})$ converges to an optimal solution.
Proof:
Let $x^*$ be an optimal solution, so that $x^*\in A$ and $f(x^*)=f^*$, $g(x^*)=c$. Since $x(\lambda)$ minimizes $f(x) + \lambda(g(x) - c)^2$ over all $x \in A$, we know:
$f(x(\lambda)) + \lambda (g(x(\lambda)) -c)^2 \leq f(x^*)$
Furthermore, it is clear that $f(x(\lambda)) + \lambda (g(x(\lambda)) - c)^2$ is non-decreasing in $\lambda$. Thus, it must reach a finite limit as $\lambda \rightarrow\infty$, and the limit must be less than or equal to $f(x^*)$.
Now take any subsequence $\lambda_1 < \lambda_2 < \cdots$ that satisfies $\lambda_k\rightarrow\infty$. Then $\{x(\lambda_k)\}_{k=1}^{\infty}$ is an infinite sequence of values that bounce around the compact set $A$. So there must be a convergent subsequence that converges to a value $z^* \in A$ (by the Bolzano-Wierstrass theorem). With a little more work, it is not difficult to show that $g(z^*) =c$ and $f(z^*) = f^*$.
Solving $\nabla f(z) + \lambda m (g(z)-c)^{m-1} \nabla g(z)=0$.
Suppose $m \geq 1$. Searching for solutions of $\nabla f(z) + \lambda m (g(z)-c)^{m-1} \nabla g(z)=0$ can sometimes help in cases when minimizing $f(z) + \lambda(g(z)-c)^m$ over $z\in\mathbb{R}^N$ fails.
Example:
Let $f(x) = -e^{x}$, $g(x) = x^2$, $c=0$. We want to minimize $-e^{x}$ subject to $x^2=0$. The answer is trivially $x^*=0$. Let's see how the two approaches handle this.
Approach 1: Minimizing $-e^{x} + \lambda x^{2m}$ over $x \in \mathbb{R}$ leads to $x\rightarrow\infty$ whenever $\lambda \in \mathbb{R}$ and $m\geq 1$. So this approach does not work.
Approach 2: Searching for $x$ that satisfy $\frac{d}{dx}[ -e^{x} + \lambda x^{2m}]=0$ leads to:
$-e^{x} + \lambda (2m) x^{2m-1}=0$
Clearly the optimal value $x^*=0$ is not a solution to this, regardless of the value of $\lambda$ and $m$. However, there are indeed roots $x(\lambda)$ of this equation that converge to the optimal value $0$ as $\lambda \rightarrow \infty$ or $\lambda \rightarrow -\infty$. So this approach "works" in a sense.
General cases:
Consider minimizing $f(z)$ subject to $g(z) = c$, where $z=(z_1, \ldots, z_N) \in \mathbb{R}^N$ and $c\in\mathbb{R}$. Suppose $x^*$ is an optimal solution.
Typically inexact performance: Suppose $m>1$. The vector $x^*$ satisfies $g(x^*)=c$, and hence it cannot be a root of the equation $\nabla f(z) + \lambda m(g(z)-c)^{m-1}\nabla g(z)=0$ unless $\nabla f(x^*)=0$, which is often untrue. So, for any finite $\lambda$, solving $\nabla f(z) + \lambda m(g(z)-c)^{m-1}\nabla g(z)=0$ typically will not give an exact answer to the desired constrained optimization problem.
Convergence when $\lambda\rightarrow\infty$ or $\lambda \rightarrow -\infty$: Suppose $f(z)$ and $g(z)$ are differentiable, and the following conditions hold:
1) $x^*$ minimizes $f(x)$ subject to $g(x)=c$.
2) There is a $\delta>0$ such that for all $\epsilon$ that satisfy $0<\epsilon<\delta$, there are vectors $x_{\epsilon}^*$ that minimize $f(x)$ subject to $g(x)=c+\epsilon$.
3) The $x_{\epsilon}^*$ vectors satisfy $\nabla f(x_{\epsilon}^*) = \mu_{\epsilon} \nabla g(x_{\epsilon}^*)$ for some real numbers $\mu_{\epsilon}$.
4) $\lim_{\epsilon\rightarrow 0^+} x_{\epsilon}^* = x^*$ and $\lim_{\epsilon\rightarrow0^+} \mu_{\epsilon} = \mu$, where $\mu$ is a nonzero real number.
Define $\lambda_{\epsilon} = -\mu_{\epsilon}/(m \epsilon^{m-1})$. Notice that:
\begin{eqnarray} \lim_{\epsilon\rightarrow 0^+} \lambda_{\epsilon} = \infty & \mbox{if $\mu<0$}\\ \lim_{\epsilon\rightarrow 0^+} \lambda_{\epsilon} = -\infty & \mbox{if $\mu>0$} \end{eqnarray} Then it is easy to check that $x_{\epsilon}^*$ solves $\nabla f(z) + \lambda_{\epsilon} m(g(z)-c)^{m-1}\nabla g(z)=0$.