Differentiability of largest eigenvalue for a $C^1$ function

373 Views Asked by At

I encountered following interesting statement:

If $f: [a, b] \to M_n(\mathbb C)$ is a $C^1$ ( $C^1$ in the interior and left/right differentiable over the end points) function over an interval $[a, b]$ where the image $f(t)$ is Hermitian everywhere. Then there exists an almost everywhere differentiable function $\lambda(t)$ which is the largest eigenvalue of $f(t)$ for every $t \in (a, b)$ such that the following equality holds almost everywhere \begin{align*} \frac {d\lambda(t)}{dt} = v(t)^* \left(\frac{d}{dt} f \right) v(t), \end{align*} where $v(t)$ is the unit vector such that for every $t$, $\lambda(t) = v(t)^* f(t) v(t)$. No regularity conditions are assumed for $v$.

How to prove the statement?

EDIT 1: After some thoughts: we know the ordered eigenvalues of Hermitian matrices are Lipschtiz with rank $1$. Note $f$ is Lipschtiz since $f$ is $C^1$ over a compact interval. Denote the rank to be $c$. So for $s, t \in [a, b]$, $$|\lambda( s) - \lambda(t)| = |\lambda_{\max}(f(s)) - \lambda_{\max}(f(t)) | \le \|f(s) - f(t)\| \le c |s -t|.$$ $\lambda(t)$ is indeed Lipschtiz and thus differentiable almost everywhere on $[a, b]$. But I still don't know how to reach the formula.


EDIT 2: I found a source for the proof which is posted as an alternative answer. But I have a minor question over the proof. If someone knows the answer, please post a comment under the proof posted by me.

2

There are 2 best solutions below

8
On BEST ANSWER

Formally, if $fv=\lambda v$ and $v^*fv=\lambda$, where $||v||=1$, then

$v'^*v=0$ and $2v'^*fv+v^*f'v=\lambda'$, that is $\lambda'=v^*f'v$.

When $f$ is analytic, we can find an analytic parametrization of its eigenvalues and eigenvectors. Yet, the eigenvalues may cross, and, consequently, the largest eigenvalue, as the maximum of smooth functions, is only Lipschitz. Thus, even playing on $f$, we cannot do better than Lipschitz for $\lambda_{\max}$.

On the other hand, the above formula may stand in $t_0$, only if $\lambda'(t_0)$ exists and $v(t_0)$ is unique.

EDIT 2. In general, when $\lambda$ is multiple in $t_0$, $\lambda$ is not differentiable in $t_0$ (cf. example below); of course, it may be otherwise: for example, choose $A(t)=I_n$. In the sequel, we consider only the $t_0$ s.t. $\lambda(t_0)$ is a simple eigenvalue of $A(t_0)$ (cf. example below). Moreover $v'(t_0)$ must exist.

$\textbf{Lemma}$. $M=\{t;\lambda(t)$ is a multiple eigenvalue$\}$ is a closed real subset.

$\textbf{Proof}$. $\lambda(t)\in M$ iff $\chi_{A(t)}(\lambda(t))=\dfrac{\partial{\chi_{A(t)}}}{\partial {x}}(\lambda(t))=0$ where $\chi_{A(t)}(x)$ is the characteristic polynomial of $A(t)$. The conclusion comes from the fact that the functions $t\rightarrow \chi_{A(t)}(\lambda(t)),t\rightarrow \dfrac{\partial{\chi_{A(t)}}}{\partial {x}}(\lambda(t))$ are continuous. $\square$

When $\lambda(t_0)$ is a simple eigenvalue, $\lambda(t)$ is a simple eigenvalue in a neighborhood $U$ of $t_0$; then $\lambda,v$ are $C^1$ functions on $U$ and the above formula is correct in $U$.

Morover, if we want that $\lambda$ is a.e. differentiable (with the above formula), then it suffices to assume that, a.e. $t$, $\lambda(t)$ is a simple eigenvalue of $A(t)$, that is, $M$ has measure $0$.

Example. Let $f(t)=diag(2t,t^3+1)$. Then $\lambda(t)=t^3+1$ when $t\geq 1$, $\lambda(t)=2t$ when $t\in (1-\epsilon,1)$. Then $\lambda'_+(1)=3,\lambda'_-(1)=2$ and the derivative does not exist in $t=1$.

0
On

Here I found a proof in Matrix Riccati Equations in Control and Systems Theory. It is on Page $164$. Unfortunately Google does not allow to preview that page. I took almost verbatim from the proof.

For each $t$, let $\lambda(t)$ be the maximal eigenvalue and $x(t)$ be the unit vector corresponding to the maximal eigenvalue. Let us introduce a function \begin{align*} \Lambda(s, t) = x^*(t) f(s) x(t). \end{align*} For each fixed $t$, the function $s \mapsto x^*(t) f(s) x(t)$ is $C^1$ and $\lambda(t) = \Lambda(t, t)$. Since the maximal eigenvalue is defined by \begin{align*} \lambda(s) = \max_{\|x\|=1} x^* f(s) x, \end{align*} we infer that $\Lambda(s, t) \le \lambda(s) = \Lambda(s,s)$ for all $s,t$. Then we conclude for all $t', t'' \in [a,b]$ with $t' < t''$: \begin{align*} \Lambda(t'', t'') - \Lambda(t', t'') &= \lambda(t'') - \Lambda(t', t'') \\ &\ge \lambda(t'') - \lambda(t') \\ &\ge \Lambda(t'', t') - \Lambda(t', t'), \end{align*} yielding \begin{align} \tag{$\star$} \label{eq:1} \int_{t'}^{t''} \frac{\partial \Lambda}{\partial s} (\sigma, t'') d \sigma \ge \lambda(t'') - \lambda(t') \ge \int_{t'}^{t''} \frac{\partial \Lambda}{\partial s} (\sigma, t') d \sigma. \end{align} Since $\frac{\partial \Lambda}{\partial s} (\sigma, t)$ is continuous in first arguemnt, then on $[a,b]$, it is bounded and we conclude $\lambda(t)$ is Lipschitz. Consequently, it is differentiable almost everywhere.

At any point $t_0$ of differentiability of $\lambda(t)$, choose $h>0$ and estimate the difference \begin{align*} \lambda(t_0+h)-\lambda(t_0) - x^*(t_0) \frac{df}{dt} (t_0) x(t_0) h = \lambda(t_0+h) -\lambda(t_0) - \frac{\partial \Lambda}{\partial s} (t_0, t_0)h. \end{align*} Utilizing the lower estimate in \eqref{eq:1} by choosing $t'=t_0$ and $t'' =t_0+h$, we obtain \begin{align*} \lambda(t_0+h) -\lambda(t_0) - \frac{\partial \Lambda}{\partial s} (t_0, t_0)h \ge \int_{t_0}^{t_0+h} \frac{\partial \Lambda}{\partial s} (\sigma, t_0) d \sigma. \end{align*} Since $\sigma \to \frac{\partial \Lambda}{\partial s}(\sigma, t_0)$ is continuous, as $h \to 0+$, then the latter integral is $o(h)$ and we conclude \begin{align*} \frac{d\lambda}{dt}(t_0) \ge \frac{\partial \Lambda}{\partial s}(t_0, t_0). \end{align*} Similarly by taking $t' = t_0 -h$ and $t''=t_0$ and utilizing the upper estimate in \eqref{eq:1}, we get \begin{align*} \frac{d\lambda}{dt}(t_0) \le \frac{\partial \Lambda}{\partial s}(t_0, t_0). \end{align*}


There is on minor part I hope someone could comment: in arriving the formula for derivative of $\lambda(t)$, it says at the point of differentiability $t_0$, but it seems to me the estimates are global. Of course we know $\lambda(t)$ could not be globally differentiable on $[a,b]$. My question is: why isn’t the derivation not applicable for those points not differentiable?