Matrix data $A$ as a transformation

39 Views Asked by At

In data science when we have $n$ datapoints in $\mathbb{R}^p$, then we defined a matrix $A$ with $n$ rows and $p$ columns representing the data points we have.

When doing linear algebra I always think about matrices as linear transformation in space. But here it doesn't make much more sens for me what this matrix $A$ is representing as a space transformation. If we observe a new point $x$ then $Ax$ send $x$ into a space of dimension $n$ with basis: $(x_{1,1}, ..., x_{n,1}), ..., (x_{1,p}, ..., x_{n,p})$. But what does this represent and how does it correlates with the datapoints we have and how is it useful ?

When looking at datapoints we want to look them in $\mathbb{R}^p$ not $\mathbb{R}^n$.

So I am trying to understand how $A$ makes sens when looking at it from a "trasnformation"' point of view, since it's always used (PCA, ...).

2

There are 2 best solutions below

0
On

It depends on the meaning of your data. Suppose you have a matrix $$ A=\begin{pmatrix} 1 & 0\\ 3 & 1\\ 8 & 5\\ \end{pmatrix} $$ meaning the amounts of three different resources needed by two different companies $E_1$ and $E_2$.

Then you can see $A$ as the matrix of a linear transformation from $\mathbb R^2$ to $\mathbb R^3$. The canonical basis of $\mathbb R^2$, $\{e_1,e_2\}$ can be thought as the companies $E_1$ and $E_2$. The linear map $f$ given by $A$ is sending every company to its list of required amounts of resources.

But, what is the meaning of $f(2,3)$? You have to think of $(2,3)=2e_1+3e_2$ as a "superposition" of the two companies. For example, $(1,1)$ could be the merger of the two companies. And the map $f$ is telling you the needed resources for that "merger" or "superposition".

0
On

You are right that your points are in ${\bf R}^p$ and you have $n$ of them. Indeed if $n=10$ and $p=2$, you have $10$ points in the standard $2$D plane.

However, from a modeling point of view, when you talk about transformation $Aw$, you need to think in terms of columns of $A$ and the space they span, i.e., the column span of the matrix $A$. Here I use the letter $w$ to not confuse with the $x$ representing points in the $2$D space, i.e., the rows of your matrix $A$. What in general you would like is to have the columns of your matrix $A$, which represent the features related to your $x$ points (such as weight and height) to be as independent as possible. So from a modeling point of view you have something like:

$$y= Aw$$ where $y$ is a vector of measurements. For each $i$, $y_i$ is a linear function of the weight and height of person $i$. What you can see is that you are doing is expressing the vector $y \in {\bf R}^n$ as a linear combination of the columns of $A$ (all representing a same quantity of interest) with the weights dictated by $w$, which has the same dimension of your points $x$.

Entry-wise you would have $y_i= x^{(i)\top} w$ where $x^{(i)\top}$ is the $i$-th row of $A$. But this does not give a geometrical interpretation as the above one.