I am working through Apostol's Calculus Vol 2 and I am a bit confused by the discussion of constructing a matrix with a basis other than the standard $I$.
For example, on the example of constructing a matrix that represents the differentiation operator on polynomials of degree <= 3: with the standard basis, the book says "we choose the basis $(1, x, x^2, x^3)$ for the domain and the basis $(1, x, x^2)$ for the range, then arrives at the following matrix:
$ \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} $
i.e.:
$ D(1) = 0 + 0x + 0x^2 \\ D(x) = 1 + 0x + 0x^2 $
and so on. So the first column represents the number of $1$s, the second column represents the number of $x$s, etc -- i.e. the $i$th entry in a row represents the coefficient for the $i$th basis vector. This makes sense to me.
Next, it computes an alternate representation for $D$ using the basis $(1, 1 + x, 1 + x + x^2, 1 + x + x^2 + x^3)$ for the domain and $(1, x, x^2)$ for the range, and arrives at:
$ \begin{bmatrix} 0 & 1 & 1 & 1 \\ 0 & 0 & 2 & 2 \\ 0 & 0 & 0 & 3 \end{bmatrix} $
because:
$ D(1) = 0 \\ D(1 + x) = 1 \\ D(1 + x + x^2) = 1 + 2x \\ D(1 + x + x^2 + x^3) = 1 + 2x + 3x^2 $
This does not make sense to me: the $i$th column still represents the coefficient for $x^{i-1}$, but that is no longer the basis of the domain. I would expect $\begin{bmatrix} 0 & 1 & 1 & 1 \end{bmatrix}$ to mean: $0 * (1) + 1 * (1 + x) + 1 * (1 + x + x^2) + 1 * (1 + x + x^2 + x^3)$ rather than $0 * (1) + 1 * (x) + 1 * (x^2) + 1 * (x^3)$.
What am I misunderstanding here?
Your interpretation of the entries of a transformation matrix is a bit off. You should concentrate on the columns of the matrix instead of its rows.
Observe first that if you right-multiply any matrix $M$ by the $j$th column of the identity matrix, the result is the $j$th column of $M$. Now, for any ordered basis whatsoever of a finite-dimensional vector space, the coordinates of the $j$th basis vector are precisely the $j$th column of the identity matrix. That is, if $\mathbf v$ is the $j$th basis vector, then its coordinates in that basis are a tuple that has zeros in every place but the $j$th one, which has a $1$, i.e., $v_i=\delta_{ij}$.
Putting these two observations together, we find that the $j$th column of a matrix $M$ that represents a linear transformation $T$ is the coordinates of the image of the $j$th basis vector of the domain of $T$. These coordinates are expressed in terms of the “output basis,” that is, the chosen basis for the codomain of $T$. To put it a little more concretely, if we have the basis $\mathcal B=\{\mathbf v_j\}$ for the domain of $T$ and $\mathcal B'=\{\mathbf w_i\}$ for the codomain, then $$T(\mathbf v_j)=\sum_i m_{ij}\mathbf w_i.$$ To put it in your terms, the $j$th column of $M$ tells you how many of each of the $\mathbf w_i$ to take in order to make $T(\mathbf v_j)$.
When you multiply an arbitrary coordinate tuple by $M$, what you’re really doing is writing the vector $\mathbf u$ as a linear combination $\sum_j u_j\mathbf v_j$ of the domain basis vectors, then applying $T$ to each term—$M$ tells us how many $\mathbf w_i$’s go into each term—and adding up the results.
Looking at your two examples, the first matrix tells us that $1$ is mapped to $0\cdot1+0\cdot x+0\cdot x^2 = 0$, $x$ is mapped to $1\cdot1+0\cdot x+0\cdot x^2=1$ and so on. (Here the dot stands for ordinary multiplication.) For the second matrix, the first column is the image of $1$, so that’s the same as before, but the second column is now the image of $1+x$: the matrix tells us that $D(1+x)=1\cdot 1+0\cdot x+0\cdot x^2=1$. Similarly, the third column tells us that $D(1+x+x^2) = 1\cdot1+2\cdot x+0\cdot x^2=1+2x$. I think it might’ve been less confusing had the author not used the same basis for the codomain in each case.
When you use this second matrix to differentiate an arbitrary polynomial $p$, you have to rewrite that polynomial in the form $a+b(1+x)+c(1+x+x^2)+d(1+x+x^2+x^3)$; the coefficients of that expression are its coordinates relative to the second basis. By linearity we have $$D(p)=aD(1)+bD(1+x)+cD(1+x+x^2)+dD(1+x+x^2),$$ and each column of the matrix tells us how to write the corresponding domain basis vector derivative in terms of the codomain basis vectors.