What is the Hessian of the spectral norm?

1k Views Asked by At

The spectral norm of a symmetric matrix is the absolute value of the top eigenvalue. The gradient of this norm is $uu^T$ where $u$ is the eigenvector associated with that top eigenvalue.

Assume that $A$ has an isolated top eigenvalue. Then it is twice differentiable. What is its Hessian?

Any comments about how one generally computes Hessians of matrix norms are also appreciated!

2

There are 2 best solutions below

1
On BEST ANSWER

Let us arrange all eigenvalues (and corresponding eigenvectors) in ascending order: $|\lambda_1| \leq \cdots |\lambda_{n-1}| < |\lambda_n|$. Then the elements of the desired Hessian are: $$\frac{\partial^2 \|A\|_2}{\partial A_{kl}\partial A_{ij}} = \frac{\partial^2 \lambda_{n}}{\partial A_{kl}\partial A_{ij}} = [\frac{\partial u^T_{n}}{\partial A_{kl}}]_i [u_{n}]_j + [u^T_{n}]_i [\frac{\partial u_{n}}{\partial A_{kl}}]_j$$

Here we use the fact that $A$ is diagonalizable, i.e., the set of eigenvectors $u_i, \ i=1,\cdots, n$ form a orthogonal basis. The basis is used to decompose $\frac{\partial u_n}{\partial A_{kl}} = \sum_{m=1}^n c_m u_m$. To find the coefficients $c_m \in \mathbb{R}$ one could follow the perturbation analysis (https://en.wikipedia.org/wiki/Eigenvalue_perturbation).

We just give the results: $$\frac{\partial u_{n}}{\partial A_{kl}} = \sum_{m=1}^{n-1} \frac{[u_m]_k[u_n]_l}{\lambda_n - \lambda_m} u_m$$ $$\frac{\partial u^T_{n}}{\partial A_{kl}} = \sum_{m=1}^{n-1} \frac{[u_m]_l[u_n]_k}{\lambda_n - \lambda_m} u_m$$ And the Hessian elements are: $$\frac{\partial^2 \|A\|_2}{\partial A_{kl}\partial A_{ij}} = \sum_{m=1}^{n-1} \frac{[u_n]_k[u_n]_j[u_m]_l[u_m]_i + [u_n]_l[u_n]_i[u_m]_k[u_m]_j}{ \lambda_n - \lambda_m} $$

2
On

@Andrey's solution is very nice, but I found the wikipedia part a bit hard to read, so for completeness, I am adding a simplified proof here to determine $\frac{\partial [u_i]_l}{\partial A_{st}}$ where $u_i$ is the $i$th eigenvector of $A$.

Consider the $i$th eigenvalue/eigenvector pair of $A$, where $Au_i = \lambda_i u_i$.

We use the perturbation analysis by taking the derivative of $Au_i=\lambda_iu_i$ via product rule $$ \partial A u_i + A\partial u_i = \partial \lambda_i u_i + \lambda_i \partial u_i. $$ Now $\partial u_i$ is a vector in $\mathbb R^n$, and we assume the eigenvectors $u_1,...,u_n$ are normalized, and thus form an orthonormal basis for $\mathbb R^n$. Therefore there exists $c_{ik}$ for $k = 1,...,n$ such that $$ \partial u_i = \sum_{k=1}^n c_{ik} u_k. $$ We substitute, left-multiply by $u_j^T$, and simplify with $Au_k = \lambda_ku_k$ to get $$ u_j^T\partial A u_i + c_{ij}\lambda_j = \partial \lambda_i u_j^Tu_i + \lambda_i c_{ij}. $$ When $i = j$, this reduces to $$ u_i^T\partial A u_i = \partial \lambda_i $$ which you can then use to find the gradient of the spectral norm. When $i\neq j$, this reduces to $$ \frac{ u_j^T\partial A u_i }{\lambda_i-\lambda_j }= c_{ij}. $$ Expanding out gives $$ \partial u_i = c_{ii}u_i + \sum_{k\neq i} \frac{ u_i^T\partial A u_k }{\lambda_i-\lambda_k } u_k = c_{ii}u_i + \sum_{k\neq i} \sum_{s,t=1}^n \frac{ \partial A_{s,t}[u_i]_s[u_k]_t }{\lambda_i-\lambda_k } u_k $$ Thus, we get $$ \frac{\partial [u_i]_l}{\partial A_{s,t}} =\sum_{k\neq i} \frac{ [u_i]_s[u_k]_t[u_k]_l}{\lambda_i-\lambda_k } $$

The rest follows from @Andrey's answer.