Where do the basis vectors live in a matrix?

91 Views Asked by At

I am working through Apostol's Calculus Vol 2 and I am a bit confused by the discussion of constructing a matrix with a basis other than the standard $I$.

For example, on the example of constructing a matrix that represents the differentiation operator on polynomials of degree <= 3: with the standard basis, the book says "we choose the basis $(1, x, x^2, x^3)$ for the domain and the basis $(1, x, x^2)$ for the range, then arrives at the following matrix:

$ \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} $

i.e.:

$ D(1) = 0 + 0x + 0x^2 \\ D(x) = 1 + 0x + 0x^2 $

and so on. So the first column represents the number of $1$s, the second column represents the number of $x$s, etc -- i.e. the $i$th entry in a row represents the coefficient for the $i$th basis vector. This makes sense to me.

Next, it computes an alternate representation for $D$ using the basis $(1, 1 + x, 1 + x + x^2, 1 + x + x^2 + x^3)$ for the domain and $(1, x, x^2)$ for the range, and arrives at:

$ \begin{bmatrix} 0 & 1 & 1 & 1 \\ 0 & 0 & 2 & 2 \\ 0 & 0 & 0 & 3 \end{bmatrix} $

because:

$ D(1) = 0 \\ D(1 + x) = 1 \\ D(1 + x + x^2) = 1 + 2x \\ D(1 + x + x^2 + x^3) = 1 + 2x + 3x^2 $

This does not make sense to me: the $i$th column still represents the coefficient for $x^{i-1}$, but that is no longer the basis of the domain. I would expect $\begin{bmatrix} 0 & 1 & 1 & 1 \end{bmatrix}$ to mean: $0 * (1) + 1 * (1 + x) + 1 * (1 + x + x^2) + 1 * (1 + x + x^2 + x^3)$ rather than $0 * (1) + 1 * (x) + 1 * (x^2) + 1 * (x^3)$.

What am I misunderstanding here?

3

There are 3 best solutions below

0
On BEST ANSWER

Your interpretation of the entries of a transformation matrix is a bit off. You should concentrate on the columns of the matrix instead of its rows.

Observe first that if you right-multiply any matrix $M$ by the $j$th column of the identity matrix, the result is the $j$th column of $M$. Now, for any ordered basis whatsoever of a finite-dimensional vector space, the coordinates of the $j$th basis vector are precisely the $j$th column of the identity matrix. That is, if $\mathbf v$ is the $j$th basis vector, then its coordinates in that basis are a tuple that has zeros in every place but the $j$th one, which has a $1$, i.e., $v_i=\delta_{ij}$.

Putting these two observations together, we find that the $j$th column of a matrix $M$ that represents a linear transformation $T$ is the coordinates of the image of the $j$th basis vector of the domain of $T$. These coordinates are expressed in terms of the “output basis,” that is, the chosen basis for the codomain of $T$. To put it a little more concretely, if we have the basis $\mathcal B=\{\mathbf v_j\}$ for the domain of $T$ and $\mathcal B'=\{\mathbf w_i\}$ for the codomain, then $$T(\mathbf v_j)=\sum_i m_{ij}\mathbf w_i.$$ To put it in your terms, the $j$th column of $M$ tells you how many of each of the $\mathbf w_i$ to take in order to make $T(\mathbf v_j)$.

When you multiply an arbitrary coordinate tuple by $M$, what you’re really doing is writing the vector $\mathbf u$ as a linear combination $\sum_j u_j\mathbf v_j$ of the domain basis vectors, then applying $T$ to each term—$M$ tells us how many $\mathbf w_i$’s go into each term—and adding up the results.

Looking at your two examples, the first matrix tells us that $1$ is mapped to $0\cdot1+0\cdot x+0\cdot x^2 = 0$, $x$ is mapped to $1\cdot1+0\cdot x+0\cdot x^2=1$ and so on. (Here the dot stands for ordinary multiplication.) For the second matrix, the first column is the image of $1$, so that’s the same as before, but the second column is now the image of $1+x$: the matrix tells us that $D(1+x)=1\cdot 1+0\cdot x+0\cdot x^2=1$. Similarly, the third column tells us that $D(1+x+x^2) = 1\cdot1+2\cdot x+0\cdot x^2=1+2x$. I think it might’ve been less confusing had the author not used the same basis for the codomain in each case.

When you use this second matrix to differentiate an arbitrary polynomial $p$, you have to rewrite that polynomial in the form $a+b(1+x)+c(1+x+x^2)+d(1+x+x^2+x^3)$; the coefficients of that expression are its coordinates relative to the second basis. By linearity we have $$D(p)=aD(1)+bD(1+x)+cD(1+x+x^2)+dD(1+x+x^2),$$ and each column of the matrix tells us how to write the corresponding domain basis vector derivative in terms of the codomain basis vectors.

0
On

Given any finite-dimensional $k$-vector space $V,$ a linear operator $T : V \to V,$ and an ordered basis $\mathscr B$ of $V,$ we can write the matrix $A$ of $T$ with respect to $\mathscr B$ by taking the $i$th column of $A$ to be $T(v_i)$ with respect to $\mathscr B.$

Given any basis $\mathscr B,$ it is true that in some cases, we can replace each of the vectors by some linear combination of the other vectors and obtain a basis. Particularly, the ordered basis $\mathscr B = \{1, x, x^2, \dots, x^n\}$ of the $k$-vector space $P_n(k)$ of polynomials of degree $\leq n$ over $k$ gives rise to the ordered basis $$\mathscr B' = \{1, 1 + x, 1 + x + x^2, \dots, 1 + x + x^2 + \cdots + x^n\}$$ of the same vector space. (Essentially, the argument is that if we write these vectors as linear combinations of the vectors of the ordered basis $\mathscr B,$ then the corresponding matrix is invertible.)

We will focus our attention on the $4$-dimensional $\mathbb R$-vector space $P_3$ that consists of polynomials of degree $\leq 3$ with real coefficients and the differentiation operator $D : P_3 \to P_3$ defined by $D(p(x)) = p'(x).$ Like we have seen, we always have an ordered basis $\mathscr B = \{1, x, x^2, x^3 \},$ but this gives rise to another ordered basis $$\mathscr B' = \{1, 1 + x, 1 + x + x^2, 1 + x + x^2 + x^3\}.$$ For notational convenience, let us rename the $i$th vector of the ordered basis $\mathscr B$ to be $v_i = x^{i - 1}$ and the $i$th vector of the ordered basis $\mathscr B'$ to be $w_i = \sum_{k = 1}^i x^{k - 1}.$ Like you have observed, we have that $D(w_1) = 0,$ $D(w_2) = 1 = v_1,$ $D(w_3) = 1 + 2x = v_1 + 2 v_2,$ and $D(w_4) = 1 + 2x + 3x^2 = v_1 + 2v_2 + 3v_3.$ Consequently, the matrix of the image of the ordered basis $\mathscr B'$ under $D$ with respect to the ordered basis $\mathscr B$ is $$\begin{pmatrix} 0 & 1 & 1 & 1 \\ 0 & 0 & 2 & 2 \\ 0 & 0 & 0 & 3 \\ 0 & 0 & 0 & 0 \end{pmatrix}.$$ On its own, I don't really see the point of this construction as it relates to calculus; however, I hope that helps.

0
On

Let's say that $B_V$ is the basis for the domain and $B_W$ is the basis for the range. In the matrix associated to a linear map, the $j$-th column represents the coordinates respect to $B_W$ of the image of the $j$ element of $B_V$:

$D(1)=0, \text{Coord}_{B_W}(0)=(0,0,0)$ is the first column.

$D(1+x)=1, \text{Coord}_{B_W}(1)=(0,1,0)$ is the second column.

$D(1+x+x^2)=1+2x, \text{Coord}_{B_W}(1+2x)=(1,2,0)$ is the third column.

$D(1+x+x^2+x^3)=1+2x+3x^2, \text{Coord}_{B_W}(1+2x+3x^2)=(1,2,3)$ is the fourth column.

This is what Apostol means when he says that we write the images as linear combinations of the elements in $B_W$ (I can't quote literally, because I hold an italian translation :)) The coefficients in such a linear combination ara just coordinates respect to $B_W$.

Since such a matrix can transform vectors in $B_V$ into coordinates respect to $B_W$, it can tansforms each vector in $V$ (its coordinates respect to $B_V$) into coordinates of vectors in $W$.

Example: if $p=4+x+3x^2+x^3$, then $\text{Coord}_{B_V}(p)=(3,-2,2,1)$ because $3(1)-2(1+x)+2(1+x+x^2)+1(1+x+x^2+x^3)=p$, and $$\begin{bmatrix} 0 & 1 & 1 & 1 \\ 0 & 0 & 2 & 2 \\ 0 & 0 & 0 & 3 \end{bmatrix}\begin{bmatrix} 3 \\ -2 \\ 1 \\ 1 \end{bmatrix}=\begin{bmatrix} 1 \\ 6 \\ 3 \end{bmatrix}$$ these are the coordinates respect to $B_W$ of $1+6x+3x^2$, the derivative of $p$.