Background
Suppose we have a linear transformation $T : V \to W$ where $V$ and $W$ are finite dimensional vector spaces with bases $(e_i)$ and $(f_i)$ respectively. Using index notation, it is tempting to define the components of $T$ with respect to the bases $(e_i)$ and $(f_i)$ as $$ T(e_i) = {T_i}^j f_j $$ If we carry out the application of $T$ to $v = v^i e_i$, we obtain $$ T(v) = T(v^i e_i) = v^i T(e_i) = v^i {T_i}^j f_j $$ So far so good. All the indices are matching up nicely. But consider the composition with $S : W \to U$ where $U$ has basis $(g_i)$. We have $$ (S \circ T)(v) = S(v^i {T_i}^j f_j) = v^i {T_i}^j S(f_j) = v^i {T_i}^j {S_j}^k g_k $$ from which we conclude $$ {(S \circ T)_i}^k = {T_i}^j {S_j}^k $$ Oh no! It looks like passing to components is a contravariant functor. Not a disaster, but certainly unexpected.
This is "unexpected" because this is not the usual way to define the components of a linear transformation with respect to a basis. Usually, one defines $$ T(e_i) = {T^j}_i f_j $$ Defined in this usual way, passing to components is a covariant functor, but we had to write down the somewhat ugly and less obvious expression ${T^j}_i f_j$. I call this ugly because the $i$ is between the two $j$'s with this convention.
We can make this look a little less ugly if we agree to write basis elements to the left, i.e. $T(e_i) = f_j {T^j}_i$. That looks a bit better, but it certainly isn't the usual convention (I've seen it in one textbook: "The Geometry of Physics" by Frankel). This convention also runs into problems when we want to think of vectors as differential operators, since $e_i v^i f$ looks like it should mean $e_i (v^i f)$, but it actually means $v^i (e_i f)$.
It seems like no matter what we do, we have to deal with some ugliness in how we define passing to components with index notation.
Question
Is there significance to the fact that the most "obvious" (I suppose a matter of opinion) way to write down the indices for a linear transformation makes passing to components a contravariant functor? Or maybe more to the point: is there significance to the fact that no matter what we do we have to deal with some ugliness / non-obviousness in how we pass to index notation, especially when we want to start thinking of vectors as differential operators?
I've run into this sort of thing before and concluded that the problem comes about because we apply functions to the left and not the right (this is, for example, why we have to read commutative diagrams "backward" when writing down the identities they imply). But it seems like there's a deeper problem here.
Apologies if this is an extremely pedantic question, but it's something that's been nagging me for some time now. Hopefully the question is clear enough.
Let us first agree to write matrices in the form $A = (a_{ij})$ and not as $A = ({a_i}^j)$. It is not essential to use upper or lower indices, we must only be able to say which of the indices $i,j$ is the first and which the second. The first index $i$ is the row index and the second index $j$ the column index. This is just the standard convention, the agreement could have been exactly the other way.
For more precise notation, let us fix a field $F$ (if you want, you can take $F = \mathbb R$). By $M(m,n)$ we denote the set of all $(m \times n)$-matrices $A = (a_{ij})$ with entries in $F$.
Let us next make precise in what sense the assigment $T \mapsto (T_{ij})$ can be regarded as a functor.
A category $\mathfrak M$ is definded as follows:
The objects are all nonnegative integers $m \in \mathbb Z$.
The set of morphisms $\mathfrak M(n,m)$ from $n$ to $m$ is $M(m,n)$.
The composition $B \circ A : n \to p$ of morphisms $A : n \to m$ and $B : m \to p$ is defined to be the usual product matrix $B \cdot A \in M(p,n)$ of $B \in M(p,m)$ and $A \in M(m,n)$.
In the definition of morphisms it sticks out that we reversed the order of $n, m$. This was done to allow the definition $B \circ A = B \cdot A$. The alternative definition $\mathfrak M(n,m) = M(m,n)$ would also be possible, but then composition must be defined by $B \circ A = A \cdot B$ which is somewhat ugly.
A category $\mathfrak V$ is defined as follows:
The objects are all pairs $(V,\mathbf b^n)$, where $V$ is a finite-dimensional vector space over the ground field $F$ and $\mathbf b^n = (b_1, \ldots, b_n)$ is an ordered basis of $V$.
The set of morphisms $\mathfrak V((V,\mathbf b^n), (W, \mathbf c^m))$ from $(V,\mathbf b)$ to $(W,\mathbf c)$ is the set of linear maps $T : V \to W$.
Composition of morphisms is defined as the usual composition of functions.
The standard approach (which you introduced with the words "Usually, one defines") to represent $T : (V,\mathbf b^n) \to (W, \mathbf c^m)$ by a matrix is to write $$T(b_j) = \sum_{i=1}^m T_{ij}c_i \tag{1}$$ with unique $T_{ij} \in F$. Then $(T_{ij}) \in M(m,n)$. It is easy to check that $T \mapsto (T_{ij})$ produces a covariant functor $$\mu : \mathfrak V \to \mathfrak M , \mu(V,\mathbf b^n) = n = \dim V, \mu(T) =(T_{ij}) .$$
Why is it the standard approach? The vector space $F^k$ has the standard ordered basis $\mathbf e^k = (e^k_1,\ldots, e^k_k)$, where the $i$-th coordinate of $e^k_i$ is $1$ and all other coordinates are $0$. The ordered bases $b^n$ of $V$ and $c^m$ of $W$ induce unique isomorphisms $\phi : V \to F^n$ such that $\phi(b_i) = e^n_i$ and $\psi : W \to F^m$ such that $\psi(c_i) = e^m_i$. This gives a commutative diagram
$\require{AMScd}$ \begin{CD} V @>{T}>> W \\ @V{\phi}VV @VV{\psi}V \\ F^n @>>{\tilde T}> F^m \end{CD} with a linear $\tilde T : F^n \to F^m$. Adopting the convention that elements $x = (x_1, \ldots,x_k) \in F^k$ are written as column vectors (i.e. as $(k \times 1)$-matrices having the entry $x_i$ in row $i$), we get $$\tilde T(x) = (T_{ij})\cdot x \tag{2} $$
This has the benefit that the input $x$ occurs to the right of the operator on both sides of the equation. However, the "niceness" of this formula depends on a whole bunch of conventions which we shall make explicit now.
Convention 1. "We apply functions to the left and not the right."
Given a function $f : A \to B$ and an element $a \in A$, we write $f(a) \in B$ for its image. This is the traditional way to do it and we do not have a real chance to change this notation. Given another function $g : B \to C$, the composition $g \circ f : A \to C$ is defined by $(g \circ f)(a) = g(f(a))$. One could of course write $f \circ g$ instead of $g \circ f$, but this would be very ugly because then $(f \circ g)(a) = g(f(a))$ which would reverse the order f $f$ and $g$ on both sides of the equation.
Unfortunately also the notation $g \circ f$ is ugly, though only in a "cultural sense" because our (western) writing system is sinistrodextral, i.e. we write from left to right. Intuitively one tends to understand $g \circ f$ as the function obtained by first applying $g$ and then applying $f$ - but it is just the other way. I am sure that the standard notation $g \circ f$ has caused a lot of confusion for beginners.
Applying functions to the right, i.e. writing for example $x \cdot f$ instead of $f(x)$, would eliminate this "uglyness", but no chance to turn back mathematical history ...
Convention 2. In a matrix $A = (a_{ij})$ the first index $i$ is the row index and the second index $j$ the column index.
This is again an arbitrary convention, the agreement could have been exactly the other way.
Formally a matrix $A \in M(m,n)$ can be regarded as a function $A : [m] \times [n] \to F$, where $[k] = \{1,\ldots,k\}$. Usually we imagine $A = (a_{ij})$ as a rectangular array in the plane having $m$ rows $i =1, \ldots, m$ and $n$ columns $j =1, \ldots, n$. This is the reason why $i$ is called the row index and $j$ the column index. However, we must be aware that this is just a convenient graphical representation $G(A)$ of $A$ which is not inherent in $A : [m] \times [n] \to F$. Actually it is even a bit strange that we depict matrices in that way. Wouldn't it be more natural to fill in the entries $a_{ij}$ at the integer lattice points of the classic plane $x$-$y$-coordinate system? That is, shouldn't $a_{ij}$ be attached at the point $(i,j) \in \mathbb N^2 \subset \mathbb R^2$? This would produce a rectangular array $G'(A)$ with $m$ columns $i =1, \ldots, m$ and $n$ rows $j =1, \ldots, n$. Anyway, $G(A)$ is the standard graphical representation of $A$ and $G'(A)$ is the non-standard one. With the transposed matrix $A^t = (a_{ji}) \in M(n,m)$ we have $G'(A) = G(A^t)$ and $G(A) = G'(A^t)$.
Convention 3. Definition of the matrix product.
The standard matrix product $B \cdot A = (c_{ik}) \in M(p,n)$ is defined for $B = (b_{ij}) \in M(p,m)$ and $A = (a_{jk}) \in M(m,n)$. The idea is to define $c_{ik}$ to be the "product" of the $i$-th row $r(B,i) = \begin{pmatrix} b_{i1} & \ldots b_{im} \end{pmatrix}$ of $B$ and the $k$-th column $c(A,k) = \begin{pmatrix} a_{1k} & \ldots a_{mk} \end{pmatrix}^t$ of $A$ via $r(B,i) \cdot c(A,k) = \sum_{j=1}^m b_{ij}a_{jk}$. Schematically it looks like
However, we could also define a product matrix $B * A = (d_{ik}) \in M(n,p)$ for matrices $B \in M(m,p)$ and $A \in M(n,m)$ by taking $d_{ik}$ to be the product of the $i$-th column $c(B,i) = \begin{pmatrix} b_{1i} & \ldots b_{mi} \end{pmatrix}^t$ of $B$ and the $k$-th row $r(A,k) = \begin{pmatrix} a_{k1} & \ldots a_{km} \end{pmatrix}$ of $A$ which results in $$B * A = A \cdot B .$$ Schematically it looks like
With the $*$-multiplication we could redefine the above category $\mathfrak M$ by taking $\mathfrak M(n,m) = M(n,m)$ and composing morphims by $B \circ A = B * A$.
Convention 4. Applying a matrix $A \in M(m,n)$ to a vector $x \in F^n$ is done by regarding $x$ as a column vector and forming the matrix product $A \cdot x$.
At least convention 4 is not so strictly carved in stone as the first three. Implicitly you "violate" it by defining $$T(b_i) = \sum_{j=1}^m T'_{ij}c_j .\tag{3} $$ This produces a matrix $(T'_{ij}) \in M(n,m)$ and gives a contravariant functor $$\mu' : \mathfrak V \to \mathfrak M . $$
The matrices $(T_{ij})$ and $(T'_{ij})$ are transposed. If we regard the elements $x \in F^n$ as row vectors, we get $$\tilde T(x) = x \cdot (T'_{ij}) . \tag{4} $$ In comparison to $(2)$ this looks a bit ugly, but it is absolutely legitimate to use $(4)$.
Note that if we would overturn convention 1, then $(4)$ would become the "nice" formula. The same is true if we would use the above alternative matrix multiplication. In fact, $(4)$ becomes $\tilde T(x) = (T'_{ij}) * x$..
As commented by Johnny Lemmon, the dichotomy between the "covariant and contravariant matrix representation functors" also has a further facet. The dual space functor $$\phantom{}^* : \mathfrak V \to \mathfrak V$$ is defined by
It is a contravariant functor and it is well-known the the matrix representations of $T$ and $T^*$ based on $(1)$ - or on $(3)$ - are transposed. In other words, the dual space functor corresponds to the contravariant transposition functor $$\phantom{}^t : \mathfrak M \to \mathfrak M$$ which is defined by
We get a commutative diagram $\require{AMScd}$ \begin{CD} \mathfrak V @>{\phantom{}^*}>> \mathfrak V \\ @V{\mu}VV @VV{\mu}V \\ \mathfrak M @>{\phantom{}^t}>> \mathfrak M \end{CD} and by construction $\mu \circ \phantom{}^* = \phantom{}^t \circ \mu = \mu'$.
This shows that formulae $(2)$ and $(4)$ are dual in a strict and formal sense.