How do the rows of a change of basis matrix form a basis for expressing columns?

2.2k Views Asked by At

I am reading this article on Principal Component Analysis (PCA) and in section III-B (page 3) it has strange definition I don't understand.

In the toy example $\mathbf{X}$ is an $m \times n$ matrix.... Let $\mathbf{Y}$ be another $m \times n$ matrix related by a linear transformation $\mathbf{P}$. $\mathbf{X}$ is the original recorded data set and $\mathbf{Y}$ is a re-representation of that data set.

$$\mathbf{P} \mathbf{X} = \mathbf{Y} \tag{1}$$

Also let us define the following quantities.

  • $\mathbf{p}_i$ are the rows of $\mathbf{P}$.
  • $\mathbf{x}_i$ are the columns of $\mathbf{X}$ (or individual $\vec{X}$).
  • $\mathbf{y}_i$ are the columns of $\mathbf{Y}$.

Equation 1 represents a change of basis and thus can have many interpretations.

  1. $\mathbf{P}$ is a matrix that transforms $\mathbf{X}$ into $\mathbf{Y}$.
  2. Geometrically $\mathbf{P}$ is a rotation and a stretch whcih again transforms $\mathbf{X}$ into $\mathbf{Y}$.
  3. The rows of $\mathbf{P}$, $\{ \mathbf{p}_1, \ldots , \mathbf{p}_m \}$, are a set of new basis vectors for expressing the columns of $\mathbf{X}$.

I do not understand this last part, how the rows $\mathbf{p}_i$ of $\mathbf{P}$ are a set of new basis vectors for expressing the columns of $\mathbf{X}$.

The reason I don't understand latter part is that change of basis matrix usually has basis in its columns, not rows. Then multiplying by column vector on the right we get combination of matrix's columns, which is exactly representation in new basis.

So I would expect new basis to be in columns of $\mathbf{P}$, not rows of $\mathbf{P}$. What am I missing here?

1

There are 1 best solutions below

5
On

Your confusion is understandable and may perhaps lead you to a deeper understanding of vectors and matrices.

The problem is that we tend to treat vectors and matrices as just rows and arrays of numbers, without clearly differentiating between different roles they play. In the scenario you're thinking of, you have columns that play the role of vectors in a vector space, and you multiply them by a column vector of coefficients. In that case, that column vector isn't being considered as a vector in a vector space; it's just a convenient grouping of numbers that allows you to succinctly express a linear combination of the column vectors on the left, yielding a vector in their vector space as a product without having to write a sum for the linear combination.

By contrast, in the text you're reading, the columns of $\mathbf X$ aren't coefficients but vectors of data. The multiplication by $\mathbf P$ isn't meant to form a linear combination (neither of coefficients in $\mathbf P$ with vectors in $\mathbf X$, nor of coefficients in $\mathbf X$ with vectors in $\mathbf P$), but to analyze the data in $\mathbf X$, and the result is a vector of coefficients. The article goes on to say in Section D on p. 5: "PCA assumes $\mathbf P$ is an orthonormal matrix." (Rather bad style to say that there and not at the point where you're reading.) That means that you can consider it as constituting an orthonormal basis, and multiplying it onto a vector analyzes that vector into its coefficients in that basis.

So despite the superficial similarity of multiplying a vector by a matrix, this is a very different operation than the one you're thinking of – instead of multiplying a vector of coefficients with a matrix of basis vectors to get a linear combination living in the same vector space as the basis vectors, in this case you're multiplying a vector that lives in a vector space with a matrix representing an orthonormal basis to get a vector of coefficients.