I read @amoeba 's answer in this post, PCA optimization problem is
$$ \underset{\mathbf w}{\text{maximize}}~~ \mathbf w^\top \mathbf{Cw} \\ \text{s.t.}~~~~~~ \|\mathbf w\|_2=1 $$
where $\mathbf C$ is the co-variance matrix. $\mathbf w$ is first principal direction. As mentioned in the post, using the Lagrange multiplier, we can change the problem into a minimization problem.
$$ \underset{\mathbf w}{\text{minimize}} ~~(\underset{\lambda}{\text{maximize}}~~ \mathbf w^\top \mathbf{Cw}-\lambda(\mathbf w^\top \mathbf w-1)) $$ Differentiating, we obtain $\mathbf{Cw}-\lambda\mathbf w=0$, which is the eigenvector equation. The end.
I think I need more examples to understand the Lagrange multiplier. Specifically, I was trying to practice it in a ridge regression problem but got stuck. The original problem is
$$ \underset{\mathbf w}{\text{minimize}}~~ \|\mathbf {Xw}-\mathbf y\|_2^2\\ \text{s.t.}~~~~ \|\mathbf w\|_2=c $$
($\mathbf X$ is the data matrix) But Using Lagrange multiplier are we transform it into following equation?
$$ \underset{\mathbf w}{\text{minimize}}~~ \underset{\lambda}{\text{maximize}} ~~\|\mathbf {Xw}-\mathbf y\|_2^2 + \lambda \mathbf w^\top \mathbf w $$
PS1. I know my math may be wrong in the problem description, please feel free to correct me in my question.
PS2. Thanks Nick Alger, I made the revisions on my equations.
Thank you.
By introducing the Lagrange multiplier, you are converting if from a minimization problem to a saddle-point problem. One seeks: $$\min_w \max_\lambda w^T C w + \lambda(w^Tw - 1).$$ The following is not correct: $$\min_w \min_\lambda w^T C w + \lambda(w^Tw - 1)$$
The saddlepoint is still a location where the gradient is zero, just like a minimum - perhaps that is the source of the confusion.