What is the regularity of the greatest eigenvalue of the Hessian matrix?

1.3k Views Asked by At

Let $f:\mathbb{R}^n\to\mathbb{R}$ be twice continuously differentiable, i.e. $f\in\mathcal{C}^2\left(\mathbb{R}^n\right)$. Define $\lambda_f:\mathbb{R}^n\to\mathbb{R}$ as the function which associates $\vec x$ with the greatest eigenvalue of the Hessian matrix of $f$ in $\vec x$. Then what can we say about the regularity of $\lambda_f$? Is it continuous/differentiable?

The question came to my mind at todays analysis exam. I have absolutely no idea how one may argue; the definition of $\lambda_f$ doesn't seem to be easily useable.

Edit:

Or even more generally: is the function $\Lambda_f:\mathbb{R}^n\to\mathbb{R}^n$ which assignes to $\vec x$ the ordered $n$-tuple (with multiplicities) of the Hessian of $f$ at $\vec x$ (i.e. $\Lambda_f\left(\vec x\right)=\left(\lambda_1,...,\lambda_n\right)$ where $\lambda_1≥...≥\lambda_n$ are the eigenvalues of $H_f\left(\vec x\right)$) continuous/differntiable? If so, can we calculate the Jacobian matrix?

2

There are 2 best solutions below

2
On

Both $\lambda_f$ and $\Lambda_f$ are continuous functions, but not necessarily differentiable.

First, a very silly example: $f(x)=x^2|x|$ is twice continuously differentiable ($f'(x)=2x|x|$ and $f''(x)=6|x|$). But the Hessian of $f$ is $H_f(x)=[6|x|]$. So $\lambda_f(x)=\Lambda_f(x)=6|x|$ (continuous but not differentiable).

[For a single variable we have the rather trivial observation that you have continuity exactly when the second derivative is continuous and differentiabilty exactly when the function has a third derivative.]

So how about a less trivial example? Consider $f(x,y)=\dfrac{1}{6}x^3-\dfrac{1}{2}xy^2$.

So $f_x(x,y)=\dfrac{1}{2}x^2-\dfrac{1}{2}y^2$, $f_y(x,y)=-xy$ and thus $f_{xx}(x,y)=x$, $f_{xy}(x,y)=f_{yx}(x,y)=-y$, and $f_{yy}(x,y)=-x$. This means that $$H_f(x,y) = \begin{bmatrix} x & -y \\ -y & -x \end{bmatrix}$$

To compute the eigenvalues: $\det(\lambda I-H_f)=(\lambda-x)(\lambda+x)-y^2=\lambda^2-(x^2+y^2)=0$ so the eigenvalues are $\pm\sqrt{x^2+y^2}$.

This means that $\Lambda_f(x,y)=(\sqrt{x^2+y^2},-\sqrt{x^2+y^2})$ and $\lambda_f(x,y)=\sqrt{x^2+y^2}$.

And we're done...these are continuous, but not differentiable functions (they've got serious problems at the origin).

Given that we found a counterexample when $f(x,y)$ is a polynomial, I am not hopeful that we could find enough hypotheses to force differentiablity for a large class of functions.

But on the bright side, these functions are continuous. There is a paper by Alexander Shapiro ("On differentiability of symmetric matrix valued functions") which I believe has a proof of this result (Hessian matrices are symmetric so they fall out as a special case of his discussion).

2
On

@ Bill Cook gave a good answer (as usual). Yet, we can say more about the considered question.

Firstly, if you consider a $C^2$ function, then its Hessian is continuous and you cannot expect better than continuity for its eigen-elements. On the other hand, considering the spectrum of a symmetric matrix $A$ as an ordered n-tuple ($\lambda_1\geq \cdots \geq \lambda_n$) is (in general) a bad idea; then, you cannot (in general) construct a differentiable parametrization of the spectrum; you obtain only a continuous parametrization (even if $f\in C^{\infty})$; in fact, one has a little more: the function $A\in S^n\rightarrow ordered\;spectrum(A)$ is locally Lipschitz continuous.

Let $S_n$ be the set of symmetric real matrices of dimension $n$. There exists a precise result when the symmetric matrix depends analytically on one parameter.

Proposition. Assume that $t\in\mathbb{R}\rightarrow M_t\in S_n$ is analytic. Then the eigenvalues and a basis of (unit length) eigenvectors of $M_t$ are globally analytically parametrizable (even if the eigenvalues present some mutiplicities; moreover, as said above, the natural ordering of the eigenvalues is not met).

Remark 1. That works also when $t\in\mathbb{R}\rightarrow M_t\in S_n$ is smooth; we must add the condition that two continuous curves $(t\rightarrow \lambda_i(t),t\rightarrow \lambda_j(t))$ (where $(\lambda_i(t),\lambda_j(t))$ are any couple of eigenvalues of $M_t$) are the same or intersect only a finite number of times.

According to the Bill's counterexample, the above result is not valid when the entries of our symmetric matrix depend on more that one parameter (here $A(x,y))$; better, Bill shows that $Hess(f)(x,y)$ may be not-differentiable in a neighborhood of a multiple eigenvalue.

Conclusion. Let $f\in C^{\infty}$. If the largest eigenvalue of $Hess(f)(x)$ is always simple, then your function $\lambda_f$ is $C^{\infty}$ and there is a $C^{\infty}$ parametrization of "the" associated eigenvector; otherwise, it is locally Lipschitz continuous and may be not $C^1$ and, moreover, "the" associated eigenvector may be not $C^0$.

Remark. Proposition and Remark 1 are also valid when $M_t$ is normal -cf reference 2 Theorem (A); note that Theorem (B) gives (complicated) results when there are several parameters-.

References. 1. http://www.mat.univie.ac.at/~michor/roots.pdf

  1. https://arxiv.org/pdf/1111.4475v2.pdf

EDIT. About your last question, assume that the eigenvalues of $Hess(f)_x$ are simple (for every $x$) and $f\in C^{\infty}$. Then the Jacobian of the function $x\in \mathbb{R}^n\rightarrow \lambda_1>\cdots>\lambda_n\in\mathbb{R}^n$ exists; more precisely, $\dfrac{\partial \lambda_i}{\partial x_j}$ can be written using "the" unit eigenvector $u_i$ associated to $\lambda_i$ and not its derivative.

Proof. Let $A_x=[a_{rs}]=Hess(f)_x$. It is known (Hadamard) that $\dfrac{\partial \lambda_i}{\partial a_{rs}}=u_i^T\dfrac{\partial A}{\partial a_{rs}}u_i=(u_i)_r(u_i)_s$.

Thus $\dfrac{\partial \lambda_i}{\partial x_j}=\sum_{rs}\dfrac{\partial \lambda_i}{\partial a_{rs}}\dfrac{\partial a_{rs}}{\partial x_j}=\sum_{rs}(u_i)_r(u_i)_s\dfrac{\partial^3 f}{\partial x_r\partial x_s\partial x_j}$.