After reading the quadratic penalty method.i still don't know what is this,take an simple question for example,this example is from page 491~492 of "Numerical Optimization" this book.
http://www.bioinfo.org.cn/~wangchao/maa/Numerical_Optimization.pdf
However.i still don't understand what is this actually doing?i mean,how does quadratic penalty method work.The only i learn is to rewrite the formula from the normal optimal case problem,i mean 17.3 to 17.4.
In this example,does quadratic penalty method actually does is like we all know the solution is $x_1=-1.x_2=-1$,but assume we don't know,so we re-write the 17.3 formula to 17.4 formula,and assume and increase the $\mu$ value to calculate the $x_1$ and $x_2$ value,and the bigger the $\mu$ value is,the closer the $x_1$ and $x_2$ value to $x_1=-1$ and $x_2=-1$ ? right?
And the best time to use the quadratic penalty method is when the constraint number isn't a lot.
Is my thinking right?if not,i hope someone can tell me the answer.


The method you're describing is just a way to turn a constrained optimization problem into an unconstrained one. Writing the constrained problem as an unconstrained one you'd have $$\text{minimize}\quad x_1 + x_2 + I_0(x_1^2+x_2^2 - 2)$$ where $I_0(x)$ is $0$ if $x=0$ and infinity otherwise (convince yourself these are identical formulations of the same problem).
The issue here is that the objective is clearly not differentiable (not even continuous) which as a general rule is not conducive to numerical optimization. Instead we can relax this problem by approximating $I_0(x_1^2+x_2^2 - 2)$ as $(x_1^2+x_2^2 - 2)^2$. Obviously this a very crude approximation but it captures the idea that we don't want the constraint violated. The more different the constraint is from $0$, the larger the penalty for that value of $(x_1,x_2)$. This a common technique in optimization called relaxation: instead of solving the original constrained optimization problem you solve the relaxed problem. $$\text{minimize}\quad x_1 + x_2 + \lambda(x_1^2+x_2^2 - 2)^2$$
where $\lambda >0$ is the penalty you give for violating the constraint (the larger it is is, the more penalty you impose for violating the constraint). This problem is nice because it's unconstrained and the objective is differentiable. You can go ahead and solve this with something like gradient descent or Newton-Raphson
You could just set $\lambda$ to be some really large number but this has some stability issues. In practice, you usually start with a small value of $\lambda$, solve the problem, then increase $\lambda$ and solve again using your previous solution as a starting point (so called warm starting). Iterate this until convergence.
Regarding when to use this method, it will work for any number of constraints. This method is useful when you have differentiable constraints and you want to turn the constrained problem into an unconstrained one (to run some sort of iterative solver say)