Transpose of a linear map with orthonormal basis in tensor and matrix notation

767 Views Asked by At

This is a follow-up to this question, and specifically on how the statement

... in index notation a matrix is written as $A^\mu_{\;\;\nu}$ and its transpose as $A_\nu^{\;\;\mu}$

is reconciled with the definition of the transpose of a linear map:

Let $V$ and $W$ be vector spaces over the same field. If $f: V\to W$ is a linear map, then the transpose (or ''dual'', or ''adjoint'), is defined to be $$\small\begin{align}^tf: W^* \to V^*\\\varphi\mapsto > \varphi\circ f\end{align}$$ The resulting functional $^tf(\varphi)$ is called the pullback of $\varphi$ along $f$.

The following identity, which characterizes the transpose, holds for all $\varphi \in W^*$ and $v\in V:$

$$\small[^tf(\varphi),v]_V=[\varphi,f(v)]_W\tag 1$$

where the bracket $[\cdot,\cdot]_V$ is the natural pairing of $V$'s dual space with $V,$ and $[\cdot,\cdot]_W$ is the same with $W.$

If I look for an example with orthonormal coordinates I see how if a vector space $V\in \mathbb R^3$ contains vectors indexed by $\small \nu = 1,2,3,$ as for example $\small v=\begin{bmatrix}1&2&3\end{bmatrix}^\top$; and $W \in \mathbb R^2$ contains vectors indexed by $\small \mu=1,2,$ such as $\small \varphi \in W^*$ represented by a row vector, $\small \varphi=\begin{bmatrix}\pi&\sqrt 2\end{bmatrix},$ and if $\small f=\begin{bmatrix}a&b&c\\d&f&g\end{bmatrix},$ equation (1) will be

$$\Bigg[\tiny{ \color{red}{\begin{bmatrix}a&d\\b&f\\c&g\end{bmatrix}}\small{\color{red}{\begin{bmatrix}\pi&\sqrt 2\end{bmatrix}}}},\tiny{\begin{bmatrix}1\\2\\3\end{bmatrix}}\large\Bigg]_V\small=\Bigg[\begin{bmatrix}\pi&\sqrt 2\end{bmatrix},\begin{bmatrix}a&b&c\\d&f&g\end{bmatrix}\tiny{\begin{bmatrix}1\\2\\3\end{bmatrix}}\large\Bigg]_W\tag 2$$

The operation $\small ^tf(\varphi)$ is evidently incongruent in matrix algebra, which I see it as the reason why the assertion "in index notation a matrix is written as $A^\mu_{\;\;\nu}$ and its transpose as $A_\nu^{\;\;\mu}$" cannot be shown with a quick example in orthonormal basis, even if it is true. Considering here the linear transformation $f$ as $A:$

${A^\mu}_{\nu}$ is really ${A^\mu}_{\nu}\; \mathrm e_\mu\otimes \mathrm e^\nu$ taking a covector in $W^*,$ such as $\varphi,$ and a vector in $V,$ such as $v$ in this order:

$$\small\begin{align} &\begin{bmatrix}\pi&\sqrt 2\end{bmatrix}\begin{bmatrix}a&b&c\\d&f&g\end{bmatrix}\tiny{\begin{bmatrix}1\\2\\3\end{bmatrix}} \\[2ex] =&\begin{bmatrix}\pi&\sqrt 2\end{bmatrix}\begin{bmatrix}a\;\mathrm e_1\otimes \mathrm e^1&b\;\mathrm e_1\otimes \mathrm e^2&c\;\mathrm e_1\otimes \mathrm e^3\\d\;\mathrm e_2\otimes \mathrm e^1&f\;\mathrm e_2\otimes \mathrm e^2&g\;\mathrm e_2\otimes \mathrm e^3\end{bmatrix}\tiny{\begin{bmatrix}1\\2\\3\end{bmatrix}}\\[2ex] &=\begin{bmatrix}a\;\mathrm e_1\otimes \mathrm e^1 (\pi*1)&b\;\mathrm e_1\otimes \mathrm e^2(\pi*2)&c\;\mathrm e_1\otimes \mathrm e^3(\pi*3)\\d\;\mathrm e_2\otimes \mathrm e^1(\sqrt 2 *1)&f\;\mathrm e_2\otimes \mathrm e^2(\sqrt 2 * 2)&g\;\mathrm e_2\otimes \mathrm e^3(\sqrt 2 * 3)\end{bmatrix}\\[2ex] &= a*(\pi*1)+b*(\pi*2)+c*(\pi*3)+d*(\sqrt 2 * 1) +f*(\sqrt 2 * 2) + g*(\sqrt 2 * 3) \end{align}$$

while the transpose of $f$ is ${A_\nu}^\mu$ really corresponding to ${A_\nu}^\mu\; \mathrm e^\nu\otimes \mathrm e_\mu,$ eating a vector in $V,$ such as $v$ and a covector in $W^*,$ such as $\varphi$ in this order. Performing this computation we would get the identical result:

$$\small\begin{align} &\begin{bmatrix}1&2&3\end{bmatrix}\begin{bmatrix}a&d\\b&f\\c&g\end{bmatrix}\tiny{\begin{bmatrix}\pi\\\sqrt 2\end{bmatrix}} \\[2ex] =&\begin{bmatrix}1&2&3\end{bmatrix}\begin{bmatrix}a\;\mathrm e^1\otimes \mathrm e_1&d\;\mathrm e^1\otimes \mathrm e_2\\b\;\mathrm e^2\otimes \mathrm e_1&f\;\mathrm e^2\otimes \mathrm e_2\\c\;\mathrm e^3\otimes \mathrm e_1&g\;\mathrm e^3\otimes \mathrm e_2\end{bmatrix}\tiny{\begin{bmatrix}\pi\\\sqrt 2\end{bmatrix}}\\[2ex] &=\begin{bmatrix}a\;\mathrm e^1\otimes \mathrm e_1(\pi*1)&d\;\mathrm e^1\otimes \mathrm e_2(\sqrt 2*1)\\b\;\mathrm e^2\otimes \mathrm e_1(\pi*2)&f\;\mathrm e^2\otimes \mathrm e_2(\sqrt 2*2)\\c\;\mathrm e^3\otimes \mathrm e_1(\pi*3)&g\;\mathrm e^3\otimes \mathrm e_2(\sqrt 2*3)\end{bmatrix}\\[2ex] &= a*(\pi*1)+b*(\pi*2)+c*(\pi*3)+d*(\sqrt 2*1) +f*(\sqrt 2*2) + g*(\sqrt 2 *3) \end{align}$$

which would lead to re-express eq (2) as

$$\Bigg[\begin{bmatrix}\pi&\sqrt 2\end{bmatrix},\begin{bmatrix}a&b&c\\d&f&g\end{bmatrix}\tiny{\begin{bmatrix}1\\2\\3\end{bmatrix}}\large\Bigg]_W=\Bigg[\small{\color{red}{\begin{bmatrix}1&2&3\end{bmatrix}}}\tiny{ \color{red}{\begin{bmatrix}a&d\\b&f\\c&g\end{bmatrix}}},\tiny{\begin{bmatrix}\pi\\\sqrt 2\end{bmatrix}}\large\Bigg]_W\small$$

and eq (1), $\small[^tf(\varphi),v]_V=[\varphi,f(v)]_W$ as

$$\small[\varphi,f(v)]_W=[^tf(v^\top),\varphi^\top]_W$$

which would be tautological, and wouldn't make sense as a proof that it fulfills the conditions for the transpose of a linear map.

How can then the index notation of a transposed matrix be reconciled with the definition of the transpose of a linear map?

1

There are 1 best solutions below

6
On BEST ANSWER

Let's define a clear and unambiguous terminology (at least for the scope of this answer).

  1. Dual map of a linear map (with respect to the natural pairing). This definition only needs two bare vector spaces. It is the definition you provided.

    Let $V$ and $W$ be vector spaces over the same field. If $f:V\to W$ is a linear map, then the dual of $f$ (denoted $f^{\text{d}}$ here) is the map $W^{*}\to V^*$ such that for any $v\in V$ and any $\varphi\in W^{*}$ $$\left(f^{\text{d}}(\varphi)\right)(v) = \varphi(f(v))\tag 1$$ Hence, $f^{\text{d}}$ can also be characterized by its action on any $\varphi\in W^{*}$: $$ \varphi\mapsto \varphi\circ f \tag{1a}$$ The resulting functional $f^{\text{d}}(\varphi)$ is called the pullback of $\alpha$ along $f$.

  2. Adjoint of a linear map (with respect to the metrics). To make this definition, we need our vector spaces to be equipped with a metric each.

    Let $V$ and $W$ be vector spaces over the same field, with metrics $g$ and $h$, respectively. If $f: V\to W$ is a linear map, then the adjoint of $f$ with respect to $g$ and $h$ (denoted $f^{\text{Ad}_{gh}}$ here) is the map $W\to V$ such that for any $v\in V$ and for any $w\in W$ $$g\left(v,f^{\text{Ad}_{gh}}(w)\right) = h(f(v),w)\tag{2}$$

  3. Transpose of a matrix. It is an operation that acts on a rectangular array of numbers. Not on linear maps. By this we mean the matrix obtained by interchanging the rows and columns of a matrix.

    Let $\mathcal{M}_{m,n}$ be the space of matrices with $m$ rows and $n$ columns. We denote the matrix $A\in\mathcal{m,n}$ by its entries $A_{ij}$, (where the first index specifies the row, and the second the column). The transpose of $A$ is the matrix $A^{\text{T}}$ defined by $$(A^{\text{T}})_{ij} = A_{ji}\tag{3}$$

Okay. Done. Now, why so much confusion? To answer this, we will choose bases on our vector spaces, and we will take the components of our linear maps.

But first, what are components?


Components

Say $\text{dim}(V) = m$ and $\text{dim}(W) = n$.

Given a map $\phi:V\to W$ and bases $\{v_i\}_{i\in\{1,\dots,m\}}$ in $V$ and $\{w_\mu\}_{\mu\in\{1,\dots,n\}}$ in $W$ (we choose the dual bases $\{\alpha^i\}_{i\in\{1,\dots,m\}}$ in $V^{*}$ and $\{\beta^\mu\}_{\mu\in\{1,\dots,n\}}$ in $W^{*}$), we define the components of $\phi$ (w.r.t the bases) by $${\phi^{\mu}}_{i} := \beta^{\mu}(\phi(v_i))$$ similarly, for a map $\psi:W^{*}\to V^{*}$, we define its components (w.r.t. the bases) by $${\psi_{i}}^{\mu} := [\psi(\beta^{\mu})](v_i)$$ From these definitions, and using the identity that defines the dual of a map $(1)$ it is clear that the components of map $f:V\to W$ are the same as the components of its dual $f^{\text{d}}:W^{*}\to V^{*}$. $${(f^{\text{d}})_{i}}^{\mu} := [f^{\text{d}}(\beta^{\mu})](v_{i}) = \beta^{\mu}(f(v_{i})) =: {f^{\mu}}_{i}$$ Hence, the problem here is not the definitions of the numbers. The issue pops up when you want to represent these sets of numbers as a matrix.


The confusion

It all starts when we want to represent our maps as matrices.

When we wish to represent a set of components as if it were a matrix, we normally want the aplication of a map to be mimicked by matrix multiplication.

We all know how to perform matrix multiplication with the "row-times-column mantra".

We would like the representing matrix of our map to act by the left on the representing matrix of our vector (because we like it: it's familiar to us, notationally speaking, since we write $f(v)$). But unfortunately this is not always possible. You have already discovered this yourself, but let's do it one more time:

Say $\text{dim}(V) = m$ and $\text{dim}(W) = n$. Consider the next situations.

Note: In the following we denote the matrix representative of the set of components of an object (with respect to a particular choice of basis) by enclosing the object inside brackets.

  1. We have two vectors $v\in V$, $w\in W$, and a map $\phi:V\to W$. We want to write the matrix analog of the mapping $\boxed{ \phi(v) = w }$. If we want to keep the order of the objects inside the equation, we are forced to represent $\phi$ as a $n\times m$ matrix, $v$ as a $m\times1$ matrix, and $w$ as a $n\times 1$ matrix, so that we can write $$[\phi][v] = [w]$$
  2. We have $v\in V$, $\alpha\in V^{*}$ (althought we could do the same with $W$ and $W^{*}$) and $k\in K$ (where $K$ is the scalar field). We want to write the matrix analog of the natural pairing $\boxed{ \alpha(v) = k}$ between $V$ and $V^{*}$. In order to keep the order of the objects, now we need to represent $\alpha$ as a $1\times m$ matrix, $v$ as a $m\times 1$ matrix, and $k$ as a $1\times1$ matrix, so that the equation becomes $$[\alpha][v] = [k]$$
  3. We have $\beta\in W^{*}$, $\alpha\in V^{*}$ and a map $\psi:W^{*}\to V^{*}$. (Remember that in finite dimensions, $\text{dim}(V) = \text{dim}(V^{*})$, and the same for $W$). We want to write the matrix analog of the mapping $\boxed{ \psi(\beta) = \alpha }$. Once again we want to keep the order of the objects, so this time we need to represent $\psi$ with a $m\times n$ matrix, $\beta$ with a $n\times1$ matrix, and $\alpha$ as a $m\times1$ matrix. Then we can write $$[\psi][\beta] = [\alpha]$$

Now consider the case in which $\phi = f:V\to W$, and $\psi = f^{\text{d}}:W^{*}\to V^{*}$. It is in this situation, and no other, where the matrix representative of a map and the matrix representative of its dual are each the transpose of the other.

However, this obviously brings problems, since the representation of $\alpha$ in the second case is incompatible with its representation in the third case. That is why virtually no one does what we did in case 3, and instead when we want to write the matrix representative of $f^{\text{d}}$ we use the caracterization given by eq. $(1a)$. I.e: we say the matrix representative of $f^{\text{d}}:W^{*}\to V^{*}$ is exactly the same as the matrix representative of $f:V\to W$, but instead of acting on the left of the matrix representative of $v\in V$, it acts on the right of the matrix representative of $\beta\in W^{*}$. (Read more here)

In this fashion, both sides of eqn. $(1)$ are represented by the same matrix operation, but with a different interpretation in the order of the matrix multiplication. I.e: $$\left(f^{\text{d}}(\varphi)\right)(v) = \varphi(f(v)) \\ \equiv \\ \left([\varphi][f^{\text{d}}]\right)[v] = [\varphi]\left([f][v]\right)$$ Where $[f^{\text{d}}] = [f]$


Conclusion:

It doesn't matter if you choose an orthonormal basis or an arbitrary one: the matrix representative of a map $f$ and the matrix representative of its dual $f^{\text{d}}$ are the transpose of each other only if you are willing to use different matrix representations of the same object when you are doing different operations on it (and we never want that, or at least I can't think of a case where I would want it).

Instead, if you always want to write elements in $V$ and $W$ as column vectors, and elements in $W^{*}$ and $V^{*}$ always as row vectors, you need to use the same matrix to represent both a map and its dual.


Then, what?

If the dual map is not what gives rise to the transpose, then what is it?

The answer is: the adjoint of a map (with respect to the metrics). And it only does when the bases you choose are orthogonal (with respect to the metrics).

Suppose $V$ and $W$ are equipped with metrics $g$ and $h$. If we have a map $f:V\to W$, then its adjoint $f^{\text{Ad}_{gh}}$ is a map $W\to V$

If we take the defining property of the adjoint (eqn. $(2)$) $$g\left(v,f^{\text{Ad}_{gh}}(w)\right) = h(f(v),w)$$ And see what happens if you take components with respect to arbitrary bases on $V$ and $W$ $$g_{ij}v^{i}{(f^{\text{Ad}_{gh}})^{j}}_{\nu}w^{\nu} = h_{\mu\nu}{f^{\mu}}_{i}v^{i}w^{\nu}$$ since this is supossed to hold for any $v\in V$ and any $w\in W$, we get $$g_{ij}{(f^{\text{Ad}_{gh}})^{j}}_{\nu} = h_{\mu\nu}{f^{\mu}}_{i}$$ multiplying on both sides by the inverse $g^{ik}$, and renaming some indices, we get $${(f^{\text{Ad}_{gh}})^{i}}_{\mu} = h_{\mu\nu}{f^{\nu}}_{j}g^{ij}\tag{4}$$ If you allow yourself to "lower the greek index" with the metric $h$ and to "raise the latin index" with the metric $g$, you will get ${f_{\mu}}^{i}$ on the right hand side, and you will understand why some people say that

... in index notation a matrix is written as $A^\mu_{\;\;\nu}$ and its transpose adjoint as $A_\nu^{\;\;\mu}$

however, this hides the fact that there are two metrics involved.

Now, if you take our result $${(f^{\text{Ad}_{gh}})^{i}}_{\mu} = h_{\mu\nu}{f^{\nu}}_{j}g^{ij}$$ and you assume the bases with are orthonormal, then $h_{\mu\nu} = \delta_{\mu\nu}$ and $g^{ij} = \delta^{ij}$, giving us $${(f^{\text{Ad}_{gh}})^{i}}_{\mu} = \delta_{\mu\nu}{f^{\nu}}_{j}\delta^{ij} = {f^{\mu}}_{i}$$ which coincides with the definition $(3)$ of the transpose of a matrix by its components.

You see, in this case we didn't need to change the way we represent our vectors. If $[f]$ is an $n\times m$ matrix that acts on the left of any $m\times1$ column matrix $[v]$ (where $v\in V$) to produce a $n\times1$ column matrix representing an element of $W$, then, for any basis, $[f^{\text{Ad}_{gh}}]$ is an $m\times n$ matrix that acts on the left of any $n\times 1$ column matrix $[w]$ (where $w\in W$) to produce an $m\times 1$ column matrix that represents an element of $V$.

But only if you choose an orthonormal basis, you have the relation $$[f^{\text{Ad}_{gh}}] = [f]^{\text{T}}$$


Conclusion:

Say you represent elements of $V$ and $W$ as column vectors.

If you have a matrix that represents a map $f:V\to W$ (which acts on the left of matrices $[v]$), then you transpose it and act with it on matrices $[w]$, you are implicitely assuming the bases you choose are orthogonal, and you are using the adjoint of $f$ to map from $W$ to $V$.

I really hope this is clear. If not, you can ask. Thank you for reading.