From Halmos' "Finite-Dimensional Vector Spaces": Similar matrices and transformations paradox

520 Views Asked by At

From section 47 called Similarity:
(In the following I will represent matrices like $[A]$ and linear transforms as $A$ and also sorry if I am not rigorous enough)

Halmos proves that when we have one linear transformation $T:V\longrightarrow V$ with matrix $[B]$ in a basis $X$ (vectors $\ \vec x_1, \vec x_2, .., \vec x_n$) and matrix $[C]$ in basis $Y$ (vectors $\ \vec y_1, \vec y_2, .., \vec y_n$) and the two bases are related by $[A]x_i=y_i$, then the two matrices are related by $[C]=[A]^{-1}[B][A]$.

He also proves that when we have two linear transformations $B$ and$C$ and $[B]=(β_{ij})$ is a matrix and the two transformations are defined as $B\vec x_j= \sum_{i=0}^nβ_{ij}\vec x_i$ and $C\vec y_j= \sum_{i=0}^nβ_{ij}\vec y_i$, the two transformations are related by $C=ABA^{-1} $.

While I have proved these things, I can't intuitively(geometrically) understand why the difference in the relation of the transformations with the relation with the matrices , since a matrix is a way to express the transformation on a coordinates system(please correct me if I am wrong as I am not a mathematician).

Also, which of the two relations are used when somebody deals with change of basis?

Trying to understand these concepts through a rotation matrix $[A]$ and a projection matrix $[B]$, I figured out that in the first case(relation between matrices), the matrix $[C]$ is presented this way in order to again project to the same plane as $[B]$ did but it just has to have different matrix elements in order for it to work in the new basis $Y$ and I suppose that is why Halmos calls the two matrices similar. But, I can't figure out such an intuitive and geometrical explanation or example of how the second case with the relation between linear transformations works and thus, I can't explain why the two transformations are called similar.

EDIT:
I understood why $[C]=[A]^{-1}[B][A]$ but I didn't understand why $C=ABA^{-1} $ and why this difference between the two relation exists.

3

There are 3 best solutions below

10
On

Well, to see an operator $T $ on $V$ finite dimensional space as a matrix you need first fix a basis on $V$, right?

So, it is allowed change the basis you've chosen, but now we expect that the matrix representing $T$ should change as well.

The role this matrix $[A]$ you cited plays is to change the coordinates for you.

If you fix the basis $\mathcal{B}$ on $V$ you get a matrix $[B]$ representing $T$, then every time you want to evaluate $T$ on a vector $v$, you can see $v$ as a coordinator vector writing it according to $\mathcal{B}$ and extracting the coefficients, right? Finally, you just multiply $[B]v$.

Now, if you want to represent $T$ according to a new basis $\mathcal{B}'$ you must know how $T$ acts on $\mathcal{B}'$. But if want to express this operation as a product of matrices you must regard a vector $v$ as linear combination of vector in $\mathcal{B}'$. So, if $[A]$ change the coordinates of a vector written according to $\mathcal{B}'$ to $\mathcal{B}$, $[A]^{-1}$ change the coordinates of a vector written according to $\mathcal{B}$ to $\mathcal{B}'$.

With this in mind, you can see the product $[A]^{-1}[B][A]v$ this way:

  1. $v$, is written according to $\mathcal{B}'$. So, $[A]v$ change its coordinates to the basis $\mathcal{B}$;
  2. Now we have a vector written according to $\mathcal{B}$, we can evaluate it using $[B]$. The result is again a vector written according to $\mathcal{B}$;
  3. Finally we change the result which is written according to $\mathcal{B}$ to $\mathcal{B}'$. This is done using $[A]^{-1}$.

This is how I see the operation you've mentioned.

5
On

You have to distinguish between (a) linear maps (also called operators) $A:\>X\to Y$ and (b) coordinate transformations (also called basis changes).

(a) A linear map takes points, or: vectors, $x\in X$ and moves them to some place $y\in Y$ (whereby in many important cases $Y=X$). Such a map could be defined by some "ruler and compass" construction, it could be a geometrically described projection, a rotation around some axis through $0\in X$, or taking the derivative $D:\>f\mapsto f'$, etc.

(b) It is a fact of life that in order to work with such things computationally, in other words: in order to solve concrete problems, we have to choose a basis ${\cal E}=(e_1,\ldots, e_n)$ in $X$, i.e., an $n$-tuple of "distinguished" vectors $e_i\in X$. When such a basis has been chosen any vector $x\in X$ is automatically encoded as an $n$-tuple $(x_1,\ldots, x_n)$ of real numbers. Unfortunately such a basis can be chosen in many ways. If some original basis is replaced by a new one, all points $x\in X$ receive new coordinates. This process is called a coordinate transformation. It is accompagned by some calculations, but nothing interesting happens, and you should not try to obtain some "intuitive feeling" for the goings on here.

Now "datawise" both processes (a) and (b) are in terms of matrices, hence look similar. But don't be fooled by this coincidence: The interesting part – where the action is – is definitely (a).

2
On

Important to this is Halmos's definition of $A$ which is $A\vec x_i=\vec y_i$ where $\vec x_i$ and $\vec y_i$ are two different sets of basis. He uses this same definition of $A$ when creating the linear transformation version of similarity as the matrix version of similarity.

Now, to illustrate what is going on, consider a one-dimensional vector made up of a coordinate and a basis vector. If you keep the size of the vector constant and think of the coordinate and the basis vector as variables, you see that the coordinate and the basis vector are inversely proportional. If you increase the coordinate, you must decrease the basis vector to keep the resulting vector constant and vice versa.

In the linear transformation version of similarity, the basis vectors are changing. In the matrix version of similarity, the coordinates are changing. So there is an inverse relationship here. The same $A$ is used in both definitions though so it must be used in an inverse way between the two definitions.