Given a Cartesian vector $x \in \mathbb{R}^3$, and suppose we wish to find the coordinates $(x'_1, x'_2, x'_3)$ of $x$ in a new rotated coordinate system that shares the same origin with original coordinate system.
One can show that $x'_j = x_k C_{ki}$, where $C_{ki}$ is the $3 \times 3$ matrix of direction cosines, and the proof is relatively easy.
Now, given a second order tensor $T_{ij}$, my textbook defines the transformation under the new rotated coordinate system as $$ T'_{mn} = C_{im}C_{jn}T_{ij}$$
Yet, the above formula is given with no proof, only a reference to Mechanics of Deformable Bodies (1964) book by Sommerfeld, which I don't have access to. I tried to prove it myself with no luck.
Note: I'm a beginner in tensor calculus, so please go easy on me.
If you are learning tensor calculus with several sources of different ages and backgrounds, you will run afoul of several points of conflicts with the conventions you were exposed to.
1. Different notations developed from several backgrounds that are relatively independent and thus kept/keep their traditions for a long time.
Despite having predecessors in the 18th and 19th century, matrix and tensor notation was invented and came into wide use only in the late 1920ies. There was the systematic coordinate version of differential geometry in the Ricci calculus as the mathematical apparatus behind general relativity. Then the algebra of infinite matrices in the developing quantum theory in the Heisenberg picture. (These were later replaced by an operator calculus, and the previous matrices only appear if selected eigen-decompositions are applied.) And then several other conventions based on their own histories, related but distinct, and with not so stringent demands on generality, for applications in Euclidean geometry, the mechanics of continua etc. where it was more important to have a compact form for a restricted set of material coefficients and their application in equations.
Your source appears to fall into the latter category.
2. Some conventions are a matter of taste among conflicting requirements.
The guiding principle exposed in the first half of your question is that linear transformations be written down in the order they occur. That is, if you first apply transformation $C_1$ and then transformation $C_2$, then their combined action be represented by the matrix product $C_1C_2$. Note that this convention underlies also many computer graphics libraries, but is not the prevailing one in mathematics and physics where operators act from the left, so if they are represented as matrices the main vector objects are column vectors.
To have the matrices act properly on the coefficient vector thus needs that vector to the left, making it a row vector. But that is from interpreting it now, it is not clear that the now common identification of a $M_{ij}$ index pair as row and column index in that order was that fixed at the time of the publication of the book, or even if there was only one fixed rectangular scheme to organized these coefficients.
3. Abstract axiomatic versus concrete constructions
The next point is that in physics the point-of-view is prevailing that a tensor is the data structure with its transformation laws. From a mathematical perspective this looks the same as saying that the real numbers are the Dedekind cuts of the rationals (or some other fixed realization of the axioms).
For a mathematician a tensor is an element of a linear space that is the product (of some sorts) of other linear spaces. The transformation laws are then a consequence of the nature of the factor spaces and basis changes there. If the factor spaces are the same vector space and its dual, then these basis changes may be synchronized.
In total it means that the formula under discussion is not something to prove, but it is a defining property of $T$ as a tensor. It characterizes it as doubly covariant. That is, if ${\bf x}=x_i{\bf e}_i$, then $${\bf T}=T_{ij}({\bf e}_i\otimes {\bf e}_j)$$ acts like a "double vector" (bi-vector?), a sum of products of vectors.
To make the circle complete, if the underlying basis is changed as ${\bf e}_i=C_{im}{\bf e}'_m$, then $$ {\bf x}=x_iC_{im}{\bf e}'_m=x'_m{\bf e}'_m \\~\\ {\bf T}=T_{ij}(C_{im}{\bf e}'_m\otimes C_{jn}{\bf e}'_n) =T_{ij}C_{im}C_{jn}({\bf e}'_m\otimes {\bf e}'_n) =T'_{mn}({\bf e}'_m\otimes {\bf e}'_n) $$ But this is truly a circle, as the nature of $T$ was divined from the given transformation law, and the last equation sequence just confirms the compatibility of the mathematical with the physical idea of a tensor.
Note that $C$ acts backwards on the basis change and forwards on the coordinate change, thus the naming of the coordinate tuple as "contra-variant" to the basis tuple.