Index Notation - An Inconsistency with a deeper meaning?

Question

Index Notation - An Inconsistency with a deeper meaning?

170 Views Asked by Bumbble Comm At 27 Mar 2026 - 1:44

Background

Suppose we have a linear transformation $T : V \to W$ where $V$ and $W$ are finite dimensional vector spaces with bases $(e_i)$ and $(f_i)$ respectively. Using index notation, it is tempting to define the components of $T$ with respect to the bases $(e_i)$ and $(f_i)$ as $$ T(e_i) = {T_i}^j f_j $$ If we carry out the application of $T$ to $v = v^i e_i$, we obtain $$ T(v) = T(v^i e_i) = v^i T(e_i) = v^i {T_i}^j f_j $$ So far so good. All the indices are matching up nicely. But consider the composition with $S : W \to U$ where $U$ has basis $(g_i)$. We have $$ (S \circ T)(v) = S(v^i {T_i}^j f_j) = v^i {T_i}^j S(f_j) = v^i {T_i}^j {S_j}^k g_k $$ from which we conclude $$ {(S \circ T)_i}^k = {T_i}^j {S_j}^k $$ Oh no! It looks like passing to components is a contravariant functor. Not a disaster, but certainly unexpected.

This is "unexpected" because this is not the usual way to define the components of a linear transformation with respect to a basis. Usually, one defines $$ T(e_i) = {T^j}_i f_j $$ Defined in this usual way, passing to components is a covariant functor, but we had to write down the somewhat ugly and less obvious expression ${T^j}_i f_j$. I call this ugly because the $i$ is between the two $j$'s with this convention.

We can make this look a little less ugly if we agree to write basis elements to the left, i.e. $T(e_i) = f_j {T^j}_i$. That looks a bit better, but it certainly isn't the usual convention (I've seen it in one textbook: "The Geometry of Physics" by Frankel). This convention also runs into problems when we want to think of vectors as differential operators, since $e_i v^i f$ looks like it should mean $e_i (v^i f)$, but it actually means $v^i (e_i f)$.

It seems like no matter what we do, we have to deal with some ugliness in how we define passing to components with index notation.

Question

Is there significance to the fact that the most "obvious" (I suppose a matter of opinion) way to write down the indices for a linear transformation makes passing to components a contravariant functor? Or maybe more to the point: is there significance to the fact that no matter what we do we have to deal with some ugliness / non-obviousness in how we pass to index notation, especially when we want to start thinking of vectors as differential operators?

I've run into this sort of thing before and concluded that the problem comes about because we apply functions to the left and not the right (this is, for example, why we have to read commutative diagrams "backward" when writing down the identities they imply). But it seems like there's a deeper problem here.

Apologies if this is an extremely pedantic question, but it's something that's been nagging me for some time now. Hopefully the question is clear enough.

Original Q&A

There are 2 best solutions below

Bumbble Comm On 30 May 2023 - 9:27

Here's how I deal with this: The basic idea is to start with a convention for matrix multiplication and stick to it religiously.

$\newcommand\V{\mathbb{V}}$ Here are my conventions:

The upper index of a matrix is the row index, and the lower index is always the column index.
In matrix multiplication, a lower index is contracted with an upper index. Always write the factor with the lower index on the left and the one with the upper index on the right, e.g., $$ (AB)^i_k = A^i_jB^j_k. $$
The indices for a basis $(e_1, \dots, e_n)$ of a vector space $V$ are lower indices and therefore the basis is written as a a row matrix, $$ E = \begin{bmatrix} e_1 & \cdots & e_n \end{bmatrix} $$
The coefficients $(a^1, \dots, a^n)$ of a vector $v \in V$ with respect to a basis $E$ are written as upper indices and therefore written as a column matrix $$ a = \begin{bmatrix} a^1 \\ \vdots \\ a^n \end{bmatrix}. $$
Now comes the twist. Given the conventions above, you must multiply ths basis by the coefficients where the left factor is the basis and the right factor is the coefficients: $$ v = e_ia^i = Ea. $$ The conventions above leave you no choice about this.
Let $E^* = (\epsilon^1, \dots, \epsilon^n)$ be the dual basis of $E$. As the indices indicate, it is written as a row matrix. The conventions now force you to write, given any $\theta \in V^*$, $$ \theta = c_i\epsilon^i = cE^*. $$ Now come the consequences:

Supposed $F = (f_1, \dots, f_n)$ is another basis. Then there is an invertible matrix $M$ such that $$ F = EM. $$ The change of basis formula is now forced on you: Suppose $$ v = Ea = Fb. $$ Then $$ v = Fb = EMb. $$ It follows that $a = Mb$, i.e., $b = M^{-1}a$. Given these conventions, there is no need to memorize change of basis formulas. The convention provides a trivial way to derive the formulas whenever you need them.
Let $L: V \rightarrow W$ be a linear map. Let $F = (f_1, \dots, f_m)$ be a basis of $W$. There is matrix $M$ such that $$ L(e_i) = f_pM^p_i. $$ This can also be written as $L(E) = FM$. If $v = e_ia^i = Ea$, then $$ L(e_ia^i) = L(e_i)a^i = f_pM^p_ia^i\text{, i.e., }L(v) = FMa. $$
Let $L: V \rightarrow W$ and $P: W \rightarrow X$ be linear maps and let $G = (g_1, \dots, g_l)$ be a basis of $X$. Let $N$ be the matrix such that $$ P(f_p) = g_\alpha N^\alpha_p\text{, i.e., }P(F) = GN. $$ Then $$ P(L(v)) = P(L(e_ia^i)) = P(L(e_i)a^i) = P(f_pM^p_ia^i) = P(f_p)M^p_ia^i = g_\alpha N^\alpha_pM^pa^i = GNMa. $$ This means the matrix associated with $P\circ L$ is $NM$, exactly as we want. This is not surprising, because it is the only formula consistent with the conventions
Let $F^* = (\phi^1, \dots, \phi^m)$ be the dual basis to $F$ on $W$.Let $L^*: W^* \rightarrow V^*$ be the map dual to $A$. By definition, $$ \langle L^*(\phi^p),e_j\rangle = \langle \phi^p, L(e_j)\rangle = \langle \phi^p, f_qM^q_j\rangle = M^p_j $$ and therefore $$ L^*(\phi^p) = M^p_i\epsilon^i, $$ i.e., $L(F^*) = ME^*$. Therefore, if $\theta = c_p\phi^p$, then $$ L^*(\theta) = L^*(c_p\phi^p) = c_pL^*(\phi^p) = c_pM^p_j\epsilon ^j, $$ i.e., $L^*(\theta) = cME^*$.
I'll let you figure out what the matrix of $L^*\circ P^*$ is.

**Bumbble Comm** · Accepted Answer

Let us first agree to write matrices in the form $A = (a_{ij})$ and not as $A = ({a_i}^j)$. It is not essential to use upper or lower indices, we must only be able to say which of the indices $i,j$ is the first and which the second. The first index $i$ is the row index and the second index $j$ the column index. This is just the standard convention, the agreement could have been exactly the other way.

For more precise notation, let us fix a field $F$ (if you want, you can take $F = \mathbb R$). By $M(m,n)$ we denote the set of all $(m \times n)$-matrices $A = (a_{ij})$ with entries in $F$.

Let us next make precise in what sense the assigment $T \mapsto (T_{ij})$ can be regarded as a functor.

Matrices as a category.

A category $\mathfrak M$ is definded as follows:

The objects are all nonnegative integers $m \in \mathbb Z$.
The set of morphisms $\mathfrak M(n,m)$ from $n$ to $m$ is $M(m,n)$.
The composition $B \circ A : n \to p$ of morphisms $A : n \to m$ and $B : m \to p$ is defined to be the usual product matrix $B \cdot A \in M(p,n)$ of $B \in M(p,m)$ and $A \in M(m,n)$.

In the definition of morphisms it sticks out that we reversed the order of $n, m$. This was done to allow the definition $B \circ A = B \cdot A$. The alternative definition $\mathfrak M(n,m) = M(m,n)$ would also be possible, but then composition must be defined by $B \circ A = A \cdot B$ which is somewhat ugly.

The category of vector spaces with bases.

A category $\mathfrak V$ is defined as follows:

The objects are all pairs $(V,\mathbf b^n)$, where $V$ is a finite-dimensional vector space over the ground field $F$ and $\mathbf b^n = (b_1, \ldots, b_n)$ is an ordered basis of $V$.
The set of morphisms $\mathfrak V((V,\mathbf b^n), (W, \mathbf c^m))$ from $(V,\mathbf b)$ to $(W,\mathbf c)$ is the set of linear maps $T : V \to W$.
Composition of morphisms is defined as the usual composition of functions.

The standard approach (which you introduced with the words "Usually, one defines") to represent $T : (V,\mathbf b^n) \to (W, \mathbf c^m)$ by a matrix is to write $$T(b_j) = \sum_{i=1}^m T_{ij}c_i \tag{1}$$ with unique $T_{ij} \in F$. Then $(T_{ij}) \in M(m,n)$. It is easy to check that $T \mapsto (T_{ij})$ produces a covariant functor $$\mu : \mathfrak V \to \mathfrak M , \mu(V,\mathbf b^n) = n = \dim V, \mu(T) =(T_{ij}) .$$

Why is it the standard approach? The vector space $F^k$ has the standard ordered basis $\mathbf e^k = (e^k_1,\ldots, e^k_k)$, where the $i$-th coordinate of $e^k_i$ is $1$ and all other coordinates are $0$. The ordered bases $b^n$ of $V$ and $c^m$ of $W$ induce unique isomorphisms $\phi : V \to F^n$ such that $\phi(b_i) = e^n_i$ and $\psi : W \to F^m$ such that $\psi(c_i) = e^m_i$. This gives a commutative diagram

$\require{AMScd}$ \begin{CD} V @>{T}>> W \\ @V{\phi}VV @VV{\psi}V \\ F^n @>>{\tilde T}> F^m \end{CD} with a linear $\tilde T : F^n \to F^m$. Adopting the convention that elements $x = (x_1, \ldots,x_k) \in F^k$ are written as column vectors (i.e. as $(k \times 1)$-matrices having the entry $x_i$ in row $i$), we get $$\tilde T(x) = (T_{ij})\cdot x \tag{2} $$

This has the benefit that the input $x$ occurs to the right of the operator on both sides of the equation. However, the "niceness" of this formula depends on a whole bunch of conventions which we shall make explicit now.

Convention 1. "We apply functions to the left and not the right."

Given a function $f : A \to B$ and an element $a \in A$, we write $f(a) \in B$ for its image. This is the traditional way to do it and we do not have a real chance to change this notation. Given another function $g : B \to C$, the composition $g \circ f : A \to C$ is defined by $(g \circ f)(a) = g(f(a))$. One could of course write $f \circ g$ instead of $g \circ f$, but this would be very ugly because then $(f \circ g)(a) = g(f(a))$ which would reverse the order f $f$ and $g$ on both sides of the equation.

Unfortunately also the notation $g \circ f$ is ugly, though only in a "cultural sense" because our (western) writing system is sinistrodextral, i.e. we write from left to right. Intuitively one tends to understand $g \circ f$ as the function obtained by first applying $g$ and then applying $f$ - but it is just the other way. I am sure that the standard notation $g \circ f$ has caused a lot of confusion for beginners.

Applying functions to the right, i.e. writing for example $x \cdot f$ instead of $f(x)$, would eliminate this "uglyness", but no chance to turn back mathematical history ...

Convention 2. In a matrix $A = (a_{ij})$ the first index $i$ is the row index and the second index $j$ the column index.

This is again an arbitrary convention, the agreement could have been exactly the other way.

Formally a matrix $A \in M(m,n)$ can be regarded as a function $A : [m] \times [n] \to F$, where $[k] = \{1,\ldots,k\}$. Usually we imagine $A = (a_{ij})$ as a rectangular array in the plane having $m$ rows $i =1, \ldots, m$ and $n$ columns $j =1, \ldots, n$. This is the reason why $i$ is called the row index and $j$ the column index. However, we must be aware that this is just a convenient graphical representation $G(A)$ of $A$ which is not inherent in $A : [m] \times [n] \to F$. Actually it is even a bit strange that we depict matrices in that way. Wouldn't it be more natural to fill in the entries $a_{ij}$ at the integer lattice points of the classic plane $x$-$y$-coordinate system? That is, shouldn't $a_{ij}$ be attached at the point $(i,j) \in \mathbb N^2 \subset \mathbb R^2$? This would produce a rectangular array $G'(A)$ with $m$ columns $i =1, \ldots, m$ and $n$ rows $j =1, \ldots, n$. Anyway, $G(A)$ is the standard graphical representation of $A$ and $G'(A)$ is the non-standard one. With the transposed matrix $A^t = (a_{ji}) \in M(n,m)$ we have $G'(A) = G(A^t)$ and $G(A) = G'(A^t)$.

Convention 3. Definition of the matrix product.

The standard matrix product $B \cdot A = (c_{ik}) \in M(p,n)$ is defined for $B = (b_{ij}) \in M(p,m)$ and $A = (a_{jk}) \in M(m,n)$. The idea is to define $c_{ik}$ to be the "product" of the $i$-th row $r(B,i) = \begin{pmatrix} b_{i1} & \ldots b_{im} \end{pmatrix}$ of $B$ and the $k$-th column $c(A,k) = \begin{pmatrix} a_{1k} & \ldots a_{mk} \end{pmatrix}^t$ of $A$ via $r(B,i) \cdot c(A,k) = \sum_{j=1}^m b_{ij}a_{jk}$. Schematically it looks like

However, we could also define a product matrix $B * A = (d_{ik}) \in M(n,p)$ for matrices $B \in M(m,p)$ and $A \in M(n,m)$ by taking $d_{ik}$ to be the product of the $i$-th column $c(B,i) = \begin{pmatrix} b_{1i} & \ldots b_{mi} \end{pmatrix}^t$ of $B$ and the $k$-th row $r(A,k) = \begin{pmatrix} a_{k1} & \ldots a_{km} \end{pmatrix}$ of $A$ which results in $$B * A = A \cdot B .$$ Schematically it looks like

With the $*$-multiplication we could redefine the above category $\mathfrak M$ by taking $\mathfrak M(n,m) = M(n,m)$ and composing morphims by $B \circ A = B * A$.

Convention 4. Applying a matrix $A \in M(m,n)$ to a vector $x \in F^n$ is done by regarding $x$ as a column vector and forming the matrix product $A \cdot x$.

At least convention 4 is not so strictly carved in stone as the first three. Implicitly you "violate" it by defining $$T(b_i) = \sum_{j=1}^m T'_{ij}c_j .\tag{3} $$ This produces a matrix $(T'_{ij}) \in M(n,m)$ and gives a contravariant functor $$\mu' : \mathfrak V \to \mathfrak M . $$

The matrices $(T_{ij})$ and $(T'_{ij})$ are transposed. If we regard the elements $x \in F^n$ as row vectors, we get $$\tilde T(x) = x \cdot (T'_{ij}) . \tag{4} $$ In comparison to $(2)$ this looks a bit ugly, but it is absolutely legitimate to use $(4)$.

Note that if we would overturn convention 1, then $(4)$ would become the "nice" formula. The same is true if we would use the above alternative matrix multiplication. In fact, $(4)$ becomes $\tilde T(x) = (T'_{ij}) * x$..

As commented by Johnny Lemmon, the dichotomy between the "covariant and contravariant matrix representation functors" also has a further facet. The dual space functor $$\phantom{}^* : \mathfrak V \to \mathfrak V$$ is defined by

$(V,\mathbf b^n)^* = (V^*,\mathbf (b^n)^*)$, where $V^*$ denotes the vector space of linear maps $V \to F$ and $(b^n)^* = (b_1^*,\ldots, b_n^*)$ is the dual basis given by $b_i^*(b_j) = \delta_{ij}$,
$T^* : W^* \to V^*, T^*(u)= u \circ T$.

It is a contravariant functor and it is well-known the the matrix representations of $T$ and $T^*$ based on $(1)$ - or on $(3)$ - are transposed. In other words, the dual space functor corresponds to the contravariant transposition functor $$\phantom{}^t : \mathfrak M \to \mathfrak M$$ which is defined by

$n^t = n$
$A^t \in M(n,m)$ = transposed matrix of $A \in M(m,n)$.

We get a commutative diagram $\require{AMScd}$ \begin{CD} \mathfrak V @>{\phantom{}^*}>> \mathfrak V \\ @V{\mu}VV @VV{\mu}V \\ \mathfrak M @>{\phantom{}^t}>> \mathfrak M \end{CD} and by construction $\mu \circ \phantom{}^* = \phantom{}^t \circ \mu = \mu'$.

This shows that formulae $(2)$ and $(4)$ are dual in a strict and formal sense.

Index Notation - An Inconsistency with a deeper meaning?

Background

Question

There are 2 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in DIFFERENTIAL-GEOMETRY

Related Questions in VECTOR-SPACES

Related Questions in INDEX-NOTATION

Trending Questions

Popular # Hahtags

Popular Questions