Lagrange multiplier method to show that $\|A\| $ equals the largest singular value of $A$.

1.1k Views Asked by At

This is a part of a lecture note (http://www.cs.cornell.edu/%7Ebindel/class/cs6210-f09/lec03.pdf) on matrix norm. It explains why 2-norm of a matrix $A$ equals the square root of the largest eigenvalue of $A^T A$ (i.e. the largest singular value).

This is a constrained optimization problem, to which we will apply the method of Lagrange multipliers: that is, we seek critical points for the functional $$ L(v, \mu) = v^T A^T A v - \mu (v^Tv - 1) $$ Differentiate in an arbitrary direction $(\delta v, \delta \mu)$ to find $$\tag {1} 2\delta v^T (A^T A v - \mu v) = 0, $$ $$\tag {2} \delta \mu(v^T v - 1) = 0. $$ Therefore, the stationary points satisfy the eigenvalue problem $$ A^T A v = \mu v $$

I know how to use Lagrange multiplier method, but (1) and (2) are strange to me. To my knowledge, the calculations for Langrange multiplier method needs gradients and thus needs partial derivatives of $L$. However, here strange symbols $\delta v$ and $\delta \mu$ do somthing instead of the partial derivatives. What are the meanings of the symbols $\delta v$ and $\delta \mu$? How can I get the equations (1) and (2)?

2

There are 2 best solutions below

0
On BEST ANSWER

Taylor expansion of $L$ gives $$ L(v + \delta v,\mu + \delta \mu) - L(v, \mu) = \nabla_v L(v, \mu) \cdot \delta v + \nabla_\mu L(v, \mu) \cdot \delta \mu + o(\sqrt{|\delta v|^2+|\delta \mu|^2}). $$ Here $\delta v$ is just an arbitrary vector, $\delta \mu$ is an arbitrary real number, $\nabla_v = (\partial/\partial v_1, \cdots, \partial/\partial v_n)$, and $\nabla_\mu = \partial/\partial\mu$. Let's denote the LHS as $\delta L(v, \mu, \delta v, \delta \mu)$. We need to find $v$ and $\mu$ such that $\delta L(v, \mu, \delta v, \delta \mu) = 0 + o(\sqrt{|\delta v|^2+|\delta \mu|^2})$ for arbitrary $\delta v$ and $\delta \mu$, i.e. the first order term should vanish for arbitrary $\delta v$ and $\delta \mu$. We call such points critical points. They are candidates at which $L$ has a minimum or a maximum.

You can calculate $\delta L$ by calculating the RHS and ignoring higher order terms as you say. However you can also calculate it by substituting $v + \delta v$ and $\mu + \delta \mu$ into the definition of $L(v, \mu)$ and ignoring higher order terms such as $\delta v^T \delta v, \delta \mu \delta v, \delta \mu^2$ etc. Sometimes the later is easier or more straightforward. $$ \begin{aligned} \delta L(v, \mu, \delta v, \delta \mu) &= (v + \delta v)^T A^T A (v + \delta v) - (\mu + \delta \mu) [(v + \delta v)^T (v + \delta v) - 1] \\ &\sim \delta v^T A^T A v + v^T A^T A \delta V - \mu \delta v^T v - \mu v \delta v^T - \delta\mu (v^T v - 1)\\ &= 2 \delta v^T A^T A v - 2\mu \delta v^T v - \delta\mu (v^T v - 1)\\ &= 2 \delta v^T( A^T A v - \mu v) - \delta\mu (v^T v - 1) \end{aligned} $$ The $\sim$ symbol means LHS and RHS are the same in the limit of $\delta v, \delta \mu \to 0$. In other words, I ignored higher order terms in the second line. Note that, in the second line, the first two terms equal each other. It is because they are transpose of each other, but they are scalas. Transpose of a scala is just itself. In the same reason the third and fourth terms also equal each other. To be zero for arbitrary $\delta v$ and $\delta \mu$, the expression enclosed by parenthesises in each term of the last line should be zero. This gives equations (1) and (2) in your post.

0
On

Let $A^TA=:G$. Then your "Lagrangian", expressed in coordinates, is $$L(v,\mu)=\sum_{i,k}v_iG_{ik}v_k-\mu \sum_i v_i^2\ .$$ It follows that we want $${\partial L\over\partial v_j}=\sum_k G_{jk}v_k+\sum_i v_iG_{ij}-2\mu v_j=0\qquad(1\leq j\leq n)\ .$$ Since $G$ is symmetric this amounts to $$\sum_k G_{jk}v_k=\mu v_j\qquad(1\leq j\leq n)\ ,$$ which is saying that $Gv=\mu v$.