How is the derivative of a function $V : \mathbb{R}^{n\times n} \mapsto \mathbb{R}$ defined?

56 Views Asked by At

Specifically, I'm considering the function $$ V(\mathbf{X}) = -\ln \det \mathbf{X}\,. $$ Following an exercise, I've been able to show that $$ V(\mathbf{X}+\varepsilon\mathbf{Y}) = V(\mathbf{X})-\varepsilon\,\text{tr}\,\mathbf{X}^{-1}\mathbf{Y}+O(\varepsilon^2)\,. $$ (tr is the trace.) The text then says that this is equivalent to saying $$ -\nabla V = (\mathbf{X}^{-1})^T\,. $$ Apparently, one would get there by taking the previous equation and transforming it into $$ \begin{split} \frac{V(\mathbf{X}+\varepsilon\mathbf{Y})-V(\mathbf{X})}{\varepsilon} = \end{split}-\text{tr}\,\mathbf{X}^{-1}\mathbf{Y}+O(\varepsilon)\,. $$ The derivative in the "direction" $\mathbf{Y}$ is then presumably given by letting $\varepsilon \rightarrow 0$, yielding $-\text{tr}\,\mathbf{X}^{-1}\mathbf{Y}$. But I don't see how that recovers the equation $$ -\nabla V = (\mathbf{X}^{-1})^T\,. $$

2

There are 2 best solutions below

3
On BEST ANSWER

Confusing matters here is that you are asking for the "derivative" on the one hand (which could refer to any of several related concepts when doing calculus in higher dimensions) and write $\nabla V$, the usual notation for gradient, on the other hand.

We can start by computing the differential $dV$, whose definition is less controversial: it's the linear map from tangent vectors on $\mathbb{R}^{n\times n}$ that computes the directional derivative of $V$ in directions $\delta X$:

$$[dV(X)](\delta X) = \lim_{\epsilon\to 0} \frac{d}{d\epsilon} V(X+\epsilon\delta X).$$

You've already computed $[dV(X)](\delta X) = -\operatorname{tr}(X^{-1}\delta X).$ This must be a linear function of $\delta X$, which may not be immediately obvious from the formula, but becomes more apparent if we rewrite the trace as a Frobenius product, $$[dV(X)](\delta X) = -X^{-T} : \delta X.$$

Now, to get from the differential to a gradient, we need an inner product $\langle M, N\rangle$ on the tangent space of $\mathbb{R}^{n\times n}$. Given an inner product, the gradient of $V$ at $X$ is defined to be the vector $\nabla V$ that satisfies $$\langle \nabla V(X), \delta X\rangle = [dV(X)](\delta X)$$ for all $\delta X$.

Notice that unlike the differential, a gradient requires, and depends on, the choice of inner product on tangent vectors. When doing calculus on $\mathbb{R}^n$ we usually (but not always; sometimes in physics the inner product involves masses, etc) pick the dot product as the inner product. Along similar lines, if we pick the Frobenius inner product as the inner product on $\mathbb{R}^{n\times n}$, we have that $$\nabla V = -X^{-T}.$$

0
On

Note that a priori your last equality cannot be right, since the derivative is a linear transformation $\mathbb R^{n\times n}\to\mathbb R$.

What happens is that one can show that any linear functional $\mathbb R^{n\times n}\to\mathbb R$ is of the form $$A\longmapsto\operatorname{Tr}(B^TA)$$ for some $B\in \mathbb R^{n\times n}$. It is natural and common to identify the above map with $B$, and thus we have an isomorphism between $\mathbb R^{n\times n}$ and its dual.

It is with this point of view that your last equality makes sense.