If the total differential is a scalar, how can it also be a covector used to calculate the directional derivative at a point?

1k Views Asked by At

In multivariable calculus, the gradient of a function $f(x,y)$ at a given point is the covariant vector:

$$\vec\nabla_f=\begin{bmatrix}\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\end{bmatrix}$$

while the corresponding contravariant vector of differentials is

$$\mathrm d\vec x=[\mathrm dx, \mathrm dy]$$

Their dot product results in the total differential of the function at that given point:

$$\mathrm df=\frac{\partial f}{\partial x}\mathrm dx+\frac{\partial f}{\partial y}\mathrm dy.$$

So far, so good. Except that this last calculation is supposed to yield a scalar value according to Reflections on Relativity by Kevin Brown (*):

"dy [read df for notational consistency] equals the scalar (dot) product of these two vectors... The scalar quantity dy is called the total differential of y."

At the same time, this last calculation is also referred to as a 1-form; a (0,1)-tensor; or a covector (actually an element of the cotangent space), thus enabling the calculation of the directional derivative at the point under consideration, and in the direction of a vector $\mathbf {\vec v}$ in the tangent space , as the dot product:

$$\langle \mathrm df , \mathbf{\vec v}\rangle\to\mathbb R.$$

The question:

Since $\mathrm df$ cannot be a scalar and a covector at the same time, which one is it, and what's wrong in the above definitions that explains this apparent contradiction?

(*) Please note that this seems to be a slight of hand (or possible in-correction in the quoted source after the accepted answer and its associated comment: " Mathematically, total differentials are covectors (or covector fields). You may apply them to vectors (or vector fields) in order to obtain scalars (or scalar fields). But they themselves should not be thought as scalars." In the accepted answer the total differential is referred to as the "differential operator at $p$ of $f.$"

2

There are 2 best solutions below

8
On BEST ANSWER

Short answer:

Technically, $\mathrm{d}f$ is a covector, not a scalar.


Long, detailed answer:

First, some linear algebra. Take any real $n$-dimensional vector space $V$ and pick a(n ordered) basis $(e_i)$. Then you may pair all vectors in $v$ with $n$-tuples of real numbers so that $v^1 e_1 + \cdots + v^n e_n \leftrightarrow (v^1, \cdots, v^n)$. For vectors, the convention is adopted that these $n$-tuples be written in a vertical fashion – i.e. a second isomorphism is established between $\mathbb R^n$ and the space of column vectors ($n\times 1$ matrices).

Now define the dual of $V$ as the vector space of linear functionals on $V$, i.e., linear maps $\varphi : V \to \mathbb R$. Our choice of basis $(e_i)$ on $V$ induces a natural choice of basis on $V^*$, the dual basis $(\varepsilon^j)$ such that $\varepsilon^j(e_i) = \delta^j_i$ (the Kronecker delta: it corresponds to the value $1$ when $i=j$, $0$ otherwise). This, way, we may also pair every linear functional $\varphi \in V^*$ with an $n$-tuple of real numbers $(\varphi_1,\dots, \varphi_n)$, and the convention now is that they are written in a horizontal fashion, i.e. as $1 \times n$ matrices.

Why the row-column vector convention? Because now the action of $\varphi \in V^*$ on $v \in V$ can be written $$\langle \varphi, v \rangle := \varphi(v) = \begin{pmatrix} \varphi_1 & \cdots & \varphi_n\end{pmatrix} \begin{pmatrix} v^1 \\ \vdots \\ v^n \end{pmatrix} = \varphi_1 v^1 + \cdots \varphi_n v^n $$ so that applying a linear functional to a vector is the same as matrix-multiplying the row vector on the left to the column vector and obtaining a scalar, as you would expect. (The map $\langle\cdot,\cdot\rangle : V^* \times V \to \mathbb R$ is called duality product and it is very close to the notion of dot product.) It is worthy of notice that sometimes linear functionals are called $1$-covariant tensors or $(0,1)$-tensors over $V$ or covectors and vectors in $V$ are called $1$-contravariant tensors or $(1,0)$-tensors over $V$.

Back to geometry/analysis. Call $\Omega$ an open subset of $n$-dimensional Euclidean space. Remember that the "partial derivative in the $k$-th direction evaluated at a point $p=(p^1, \dots, p^n) \in \Omega$" operation, $\frac{\partial}{\partial x^k}\Big|_{p=(p^1,\dots,p^n)}$, is linear, that is, if $a,b \in \mathbb R$ and $f,g : \Omega \to \mathbb R$ are smooth functions, $$\begin{split} \frac{\partial}{\partial x^k}\Bigg|_{p} (af + bg) &= \frac{\partial (af + bg)}{\partial x^k}(p) \\ &=a \frac{\partial f}{\partial x^k}(p) + b \frac{\partial g}{\partial x^k}(p) \\ &= a \frac{\partial}{\partial x^k}\Bigg|_{p} f + b \frac{\partial}{\partial x^k}\Bigg|_{p} g \end{split}$$ so it makes sense to consider the vector space of derivations at $p$, that is, the vector space $T_p\Omega$ (also called "tangent space at $p$") of all linear combinations of partial differential operators $$\sum_{k=1}^n a^k \frac{\partial}{\partial x^k}\Bigg|_{p} = a^1 \frac{\partial}{\partial x^1}\Bigg|_{p} + \cdots + a^n\frac{\partial}{\partial x^n}\Bigg|_{p} $$ It is clear that $\left(\frac{\partial}{\partial x^k}\Big|_{p}\right)$ constitutes an ordered basis of $T_p\Omega$, which is then $n$-dimensional. Of course, you may apply the column-vector convention and identify the above linear combination with the vertical list $\begin{pmatrix} a^1 \\ \vdots \\ a^n \end{pmatrix}$.

What is then its dual space? It's the "cotangent space at $p$", the space $T^*_p\Omega$ of all covectors on $T_p\Omega$. The natural choice of basis on $T^*_p\Omega$ is the dual basis of covectors that yield the Kronecker delta when applied to the pure partial differential operators at $p$. These are the maps $$\mathrm{d}x^m|_p : T_p\Omega \to \mathbb R \qquad \text{s.t.} \qquad \mathrm{d} x^m|_p \left(\frac{\partial}{\partial x^k}\Bigg|_{p}\right) = \delta^m_k $$ so that any covector $\alpha|_p \in T_p^*\Omega$ can be written as $$\alpha|_p = \alpha_1 \mathrm{d}x^1|_p + \cdots + \alpha_n \mathrm{d}x^n|_p. $$

Now, given a smooth function $f : \Omega \to \mathbb R$, the differential operator at $p$ of $f$ is the covector in $T_p^*\Omega$ such that the coefficients $\alpha_m$ are exactly $ \alpha_m = \frac{\partial f}{\partial x^m}(p)$, so that $$\mathrm{d}f|_p = \frac{\partial f}{\partial x^1}(p)\mathop{} \mathrm{d}x^1|_p + \cdots + \frac{\partial f}{\partial x^n}(p) \mathop{}\mathrm{d}x^n|_p$$ This also makes sense out of the notation $\mathrm{d}x^m|_p$: the basis elements are exactly the differentials of the coordinate functions $ x^m : p \mapsto p^m$. Of course, you may now apply the row-vector convention to get the row-gradient of $f$ $$\begin{pmatrix} \dfrac{\partial f}{\partial x^1}(p) & \cdots & \dfrac{\partial f}{\partial x^n}(p) \end{pmatrix} $$ The gradient of $f$ is transpose of this vector, i.e. the column vector $\nabla f(p)$ you get by turning this row vector $90$ degrees clockwise. Thus, applying the row-gradient to a vector is the same thing as dotting the gradient to that same vector.

The directional derivative. Now you can guess how this plays out. Let $p \in \Omega$, $v|_p \in T_p\Omega$ and let $f$ be a smooth function over $\Omega$. Then the directional derivative of $f$ at $p$ in the direction $v|_p$ is just the scalar $$\begin{split} \langle \mathrm d f|_p, v|_p \rangle = \mathrm{d}f|_p (v|_p) &= \begin{pmatrix} \dfrac{\partial f}{\partial x^1}(p) & \cdots & \dfrac{\partial f}{\partial x^n}(p) \end{pmatrix} \begin{pmatrix} v^1|_p \\ \vdots \\ v^n|_p \end{pmatrix} = \nabla f(p) \cdot \begin{pmatrix} v^1|_p \\ \vdots \\ v^n|_p \end{pmatrix} \\ &= v^1|_p \dfrac{\partial f}{\partial x^1}(p) + \cdots + v^n|_p \dfrac{\partial f}{\partial x^n}(p) \end{split}$$ and we are done.


Addendum. So far we've been working in the tangent space at a single point $p \in \Omega$. But nothing prevents us from defining maps that assign to each point $p \in \Omega$ a vector or a covector. These are called fields (although these should be, in a more sophisticated picture, sections of the tangent or cotangent bundle of $\Omega$). Covector fields are also called $1$-forms; for example, $\mathrm{d}f$ is a one form (to each $p$ we assign $\mathrm{d} f|_p$). One may extend the operation of applying a covector to a vector to fields: if $v$ is the vector field such that to each $p$ is associated a vector $v|_p \in T_p\Omega$, and $\alpha$ is a covector field defined similarly, then $\langle\alpha,v\rangle$ is the scalar field (i.e., real function) that associates to each $p$ the real number $\langle \alpha|_p,v|_p\rangle = \alpha|_p(v|_p)$. If $\alpha = \mathrm d f$, you get an extension of the definition of directional derivative we gave above: just picture a field of vectors distributed over $\Omega$ and a function $f$ with its gradient, and imagine dotting the gradient at a point with the specific vector assigned at that point, and doing so for all points in $\Omega$. In some way, this gives an idea of "how well" the gradient field conforms to the given vector field.

0
On

I think that in reality, what we can name as a vector or covector is just how they transform under the Jacobian matrix. What we call a covariant basis (basis vector) and a contravariant basis (basis covector) are in the first place, a row "vector" or a $1 \times n$ matrix of "something", and in the second case, a column "vector" or a $n \times 1$ matrix of "something", where in both cases, that "something" is not a scalar.

For the case of the partial derivative ${\partial \vec R\over \partial x_i}$, as the basis vector $e_i$ of the curve $\vec R$, it goes like this:

First, the derivative of a $\vec R$ is a tangent vector $d\vec R(t)/dt$. By the chain rule:

$$ {d\vec R\over\partial t} = {\partial\vec R\over\partial x_i}{dx_i\over dt}$$

and a vector $\vec v$ can be expanded as linear combinantion of $e_i$, where $v^i$ are scalars:

$$ \vec v = v^ie_i$$

Now, how we $d\vec R\over\partial t$ is a vector, it can be expand as:

$$ {d\vec R\over\partial t} = v^ie_i$$

$$ v^ie_i = {\partial\vec R\over\partial x_i}{dx_i\over dt}$$

How ${\partial\vec R\over\partial x_i}$ are vectors, and can be proven that they are tangent to the "grid" of the coordinate system, they can be re-interpreted as the "basis vectors" of ${d\vec R\over\partial t}$, so:

$$ {\partial\vec R\over\partial x_i} = e_i$$

$$ {dx_i\over dt} = v^i$$

Now we know that ${\partial\vec R\over\partial x_i}$ is the basis vector, we ask what happend if we change the basis. For $\vec v$ in the new basis $\hat e_j$, we expand it using the chain rule for partial derivatives:

$$ \hat e_j = {\partial \vec R\over\partial \hat x_j} = {\partial \vec R\over\partial x_i} {\partial x_i\over\partial \hat x_j}$$

$$\hat e_j = e_i{\partial x_i\over\partial \hat x_j}$$

Using matrix notation, the change of basis ocurr as:

$${\begin{bmatrix}e_1 & \cdots & e_n \end{bmatrix}} {\begin{bmatrix} {\partial x_1\over\partial x_1} & \cdots & \partial x_1\over\partial {\hat x_n} \\ \vdots & \ddots & \vdots \\ {\partial x_n\over\partial \hat x_1} & \cdots & \partial x_n\over\partial \hat x_n \\ \end{bmatrix}} = {\begin{bmatrix} \hat e_1 & \cdots & \hat e_n \end{bmatrix}}$$

and ${\partial x_i\over\partial \hat x_j}$ are the entries of the Jacobian matrix. $$ $$ Now, for the second case; using the chain rule of differentials:

$$ df = {\partial f\over\partial x_i}{dx_i}$$

we can expand the term ${\partial f\over\partial x_i}$ using the chain rule for partial derivative:

$$ {\partial f\over\partial x_i} = {\partial f\over\partial \hat x_j}{{\partial \hat x_j\over\partial x_i}}$$

and using in the chain rule for differentials:

$$ df = {\partial f\over\partial x_i}{dx_i} = {\partial f\over\partial \hat x_j}{\partial \hat x_j\over\partial x_i} dx_i$$

Note that ${\partial f\over\partial \hat x_j}$ is a partial derivative of the coordinate $\hat x_j$, not $x_i$; but how we are talking of vector and covector, which are invariant of the coordinate system, we have that $df$ in the $\hat x_j$ coordinates is:

$$ df = {\partial f\over \partial \hat x_j} d\hat x_j = {\partial f\over\partial \hat x_j}{\partial \hat x_j\over\partial x_i} dx_i$$

$${\partial f\over \partial \hat x_j} d\hat x_j = {\partial f\over\partial \hat x_j}{\partial \hat x_j\over\partial x_i} dx_i$$

diving in both size by $\partial f\over\partial \hat x_j$ ...

$$ d\hat x_j = {\partial \hat x_j\over\partial x_i} dx_i$$

and writing in matrix form is:

$$ {\begin{bmatrix} {\partial x_1\over\partial x_1} & \cdots & \partial x_1\over\partial {\hat x_n} \\ \vdots & \ddots & \vdots \\ {\partial x_n\over\partial \hat x_1} & \cdots & \partial x_n\over\partial \hat x_n \\ \end{bmatrix}}{\begin{bmatrix}dx_1 \\ \cdots \\ dx_n \end{bmatrix}} = {\begin{bmatrix} d\hat x_1 \\ \cdots \\ d\hat x_n \end{bmatrix}}$$

which indicates two things:

  1. like the term $\partial x_i \over \partial \hat x_j$ correspond to the entries of the Jacobian matrix, the term $\partial \hat x_j \over \partial x_i$ correspond to the entries of the Inverse Jacobian matrix, because expanding $\partial x_i \over \partial x_k$ using the chain rule for partial derivatives is:

$$ {\partial x_i \over \partial x_k} = {\partial x_i \over \partial \hat x_j}{\partial \hat x_j \over \partial x_k}$$

and are the entries of matrix multiplication $J^{-1}\times J$. When $i = k$, the expansion above is $ {\partial x_i \over \partial x_k} = 1$; and when $i \neq k$, $ {\partial x_i \over \partial x_k} = 0$, because the basis are orthogonal to each other, so:

$$ {\partial x_i \over \partial \hat x_j}{\partial \hat x_j \over \partial x_k} = \delta^i_k$$

where $\delta^i_k $ (Kronecker's delta) are the entries of the identity matrix.

  1. Differentials are contravarient because they transform in the opposite way of basis vector, so they can form a contravarient basis, that is to say, a covector basis.