Product rule for matrices, think of them as vectors or not?

87 Views Asked by At

I am having difficulties interpreting an equality in a couple of lecture notes(http://www.matematik.lu.se/matematiklu/personal/sigma/Riemann.pdf).

It can be found on page $52$ and looks as follows,

$\frac{d}{dt}(df_{e^{tY_{I}}}(e^{tY_{I}}\cdot X_{I}))\mid_{t=0}=df_{I}^{2}(Y_{I},X_{I})+df_{I}(Y_{I}\cdot X_{I})$.

As for notation

$f:GL_{n}(\mathbb{R}) \rightarrow \mathbb{R}$ is a smooth function defined locally around $I$

$Y_{I}$ and $X_{I}$ are tanget vectors at $I$

$e^{tY_{I}}$ represents a curve through $I$ with derivative $Y_{I}$ at $I$.

Clearly there is some sort of product rule at work, however I cannot make sense of it. I dont know if it is preferable to think of the objects as matrices or vectors to make things add up nicely, also I dont know how to think of the Hessian here as a function of two tangent vectors.

I seen some explanations which involves a connections but this is not introduced yet.

1

There are 1 best solutions below

7
On BEST ANSWER

This question has nothing to do with Riemannian geometry; all that is needed is a systematic treatment of multivariable calculus. You are right, there is some sort of generalized product rule at play (which is a consequence of the multidimensional chain rule). For geometric insight it may help to think of them as tangent vectors, but to understand the differential calculus, it doesn't matter; what's more important is a firm understanding of Linear Algebra so that the following constructions are easy to digest. I highly recommend reading Loomis and Sternberg's Advanced Calculus, chapter $3$ in specific, for more details about everything I say (that's where most of my understanding came from).

First, we need to establish some preliminary notions. Let $V,W$ be real Banach spaces, and let $A$ be an open subset of $V$. For what follows, finite-dimensionality doesn't simplify any of the claims; but if you want to assume it, go ahead. Let $f:A \to W$ be a function. I assume you know what it means for $f$ to be (Frechet) differentiable at a point $\alpha \in A$ (otherwise, see Section 3.6). Suppose that $f$ is differentiable at every point of $A$. Then, we get a new function \begin{equation} df: A \to L(V,W), \qquad \alpha \mapsto df_{\alpha} \end{equation} This new function is nothing too complicated; it is just a map from an open subset of a Banach space into another Banach space, so we can ask whether this function is differentiable at a point $\alpha \in A$. If it is, we denote its derivative at $\alpha$ by $d(df)_{\alpha} \equiv d^2f_{\alpha}$ ($\equiv$ just means they're different notations for same thing). Note the spaces which the second differential maps between, $d^2f_{\alpha}: V \to L(V,W)$, which means it is an element of $L\left(V, L(V,W) \right)$.

You asked how one can interpret "the hessian as a function of two tangent vectors". To do this, note that there is a natural isomorphism between $L(V, L(V,W))$, which is the space of linear maps from $V$ into $L(V,W)$, and $L^2(V;W)$, the space of bilinear maps from $V \times V$ into $W$. The isomorphism is given as follows: $\Phi: L(V,L(V,W)) \to L^2(V;W)$, \begin{equation} \Phi(T)[\xi, \eta] = \left(T(\xi) \right)(\eta) \end{equation} So the meaning of this is that $d^2f_{\alpha}$ is currently a linear function of one variable, and its output is a linear transformation, and hence it can "eat" another vector. This is "equivalent" to a new object, $\Phi(d^2f_{\alpha})$, which "eats" two vectors simultaneously and is bilinear. Since the isomorphism $\Phi$ is so natural, we shall suppress it and abuse notation slightly, and switch back and forth between $d^2f_{\alpha}$, and $\Phi(d^2f_{\alpha})$, and denote them both as $d^2f_{\alpha}$ $^1$.

Now that we have established what the second differential means and how to interpret it, we can proceed to the "generalised product rule" (which is really a special case of the chain rule) see Chapter $3$, Theorem $8.4$ for the proof. I'll state the theorem here (with slightly different notation).

Generalised Product Rule. (Loomis and Sternberg Theorem $8.4$)

Let $U,V,W, X$ be normed vector spaces. Let $g: U \to V$, and $h: U \to W $ be functions which are differentiable at a point $\beta \in U$. Let $\omega: V \times W \to X$ be a bounded bilinear map. With these assumptions, the composite function $F : U \to X$ defined by \begin{equation} F(\xi) = \omega(g(\xi), h(\xi)) \end{equation} is differentiable at $\beta$ and its derivative at $\beta$ (which is a linear map from $U$ into $X$) is given by the formula \begin{align} dF_{\beta}(\cdot) = \omega(dg_{\beta}(\cdot), h(\beta)) + \omega(g(\beta), \tag{*}dh_{\beta}(\cdot)), \end{align} i.e for all $x \in U$, we have \begin{equation} dF_{\beta}(x) = \omega(dg_{\beta}(x), h(\beta)) + \omega(g(\beta), dh_{\beta}(x)). \end{equation}

In this theorem, we like to think of the bilinear map $\omega$ as a sort of "multiplication", so that $F(\xi)$ is the "product" of $g(\xi)$ and $h(\xi)$. Notice how the derivative is very similar to single variable: "differentiate the first, keep the second, $+$ keep the first, differentiate the second".


Finally, we can get to the actual derivative you're interested in. Here, in conformity to the notation of the theorem, we define the following maps:

  • $\omega: L\left( M_{n \times n}(\mathbb{R}), \mathbb{R} \right) \times M_{n \times n}(R) \to \mathbb{R}$, defined by $\omega(T, \xi) = T(\xi)$. So $\omega$ is like the "evaluation" map. You can easily check that it is bilinear (boundedness follows from the fact that the spaces are finite-dimensional)
  • $g: \mathbb{R} \to M_{n \times n}(\mathbb{R})$, defined by $g(t) = e^{t Y_I}$.
  • $h: \mathbb{R} \to M_{n \times n}(\mathbb{R})$, defined by $h(t) = e^{t Y_I} \cdot X_I$

With these "auxillary" maps defined, you can define the function you're really interested in: \begin{align} F(t) &= df_{g(t)}(h(t)) \\ &= \omega(df_{g(t)}, h(t)) \end{align} (I hope you realise that $df_{g(t)}$ is convenient notation for $\left[(df) \circ g \right](t)$, and that subscripts just avoid more brackets.) You now wish to compute $F'(0)$. Here's the general formula for $F'(t)$. \begin{align} F'(t) &= dF_t(1) \tag{Theorem $7.1$} \\ &= \omega \left( d\left( (df) \circ g\right)_t(1), h(t) \right) + \omega \left( df_{g(t)}, dh_t(1)\right) \tag{product rule} \\ &= \omega \left( [d^2f_{g(t)} \circ dg_t](1), h(t) \right) + \omega \left( df_{g(t)}, dh_t(1)\right) \tag{chain rule to $1^{st}$ term} \\ &= \omega \left( d^2f_{g(t)}[g'(t)], h(t) \right) + \omega \left( df_{g(t)}, h'(t)\right) \tag{Theorem $7.1$} \\ &= \left( d^2f_{g(t)} \left[g'(t) \right] \right)[h(t)] + df_{g(t)}[h'(t)] \tag{defn of $\omega$} \\ & \equiv d^2f_{g(t)} \left[ g'(t), h(t) \right] + df_{g(t)}[h'(t)]. \end{align} In the last line, I made the abuse of notation by suppressing the isomorphism $\Phi$. Substituting $t=0$, and using your knowledge of derivative of matrix exponentials will give you the desired answer.

With a bit of practice (such as the questions in Chapter 3 of the book), you'll be very comfortable with these manipulations, that it becomes unnecessary to explicitly write out the "auxillary" maps $\omega, g, h$ and you'll be directly able to apply the product rule to compute $F'(t)$.


[1.] For more about the second differential, see section $3.16$ of the book