What is an intuitive way to understand the dot product in the context of matrix multiplication?

1.2k Views Asked by At

I was trying to understand where it came from that each row in a matrix multiplication is a dot product, as in:

$$ Ax = \left( \begin{array}{ccc} a_{1}^T \\ \vdots \\ a_m^T \end{array} \right)x = \left( \begin{array}{ccc} a_{1}^Tx \\ \vdots \\ a_m^T x \end{array} \right) $$

what is an intuitive explanation or interpretation that each row is a dot product of the vector x?

What I do understand is that $Ax$ encodes a linear transformation $T$. Consider a super simple example in 2 dimensions to explain what I do understand. I understand that $Ax = A [x_1 x_2] = T(v) = T(x_1 \hat i + x_2 \hat j) = x_1 T(\hat i) + x_2 T( \hat j)$. This makes me interpret intuitively that a multiplication by a matrix gives me a new vector that is composed of the same linear combination of the transformed basis vectors (or whatever vectors v is composed of)[source]. Furthermore one can easily see from this view where the multiplication of a matrix comes from:

$$Ax = \left[ \begin{array}{ccc} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \end{array} \right] x = \left[ \begin{array}{ccc} T(\hat i)_1 & T(\hat j)_1\\ T(\hat i)_2 & T(\hat j)_2\\ \end{array} \right] \left[ \begin{array}{ccc} x_{1} \\ x_{2} \\ \end{array} \right] = x_1\left[ \begin{array}{ccc} T(\hat i)_1 \\ T(\hat i)_2 \\ \end{array} \right] + x_2 \left[ \begin{array}{ccc} T(\hat j)_1\\ T(\hat j)_2\\ \end{array} \right] = \left[ \begin{array}{ccc} T(\hat i)_1 x_1 + T(\hat j)_1 x_2\\ T(\hat i)_2x_2 + T(\hat j)_2 x_2\\ \end{array} \right] $$

where now its obvious why matrix multiplication is defined the way it is (because of linear transformations). Notice that the nice thing about this view is that one can interpret that each column of the matrix tells us how each basis vector changes. i.e. each column specifies how $\hat i$, $\hat j$ are transformed. Furthermore, the amount it used to be in the old vector is retained but now its in the new direction $T(\hat i)$ for the first coordinate. This for me is really intuitive and explains a lot of where matrix multiplication comes from.

However, if you notice this view reveals that each row $(Ax)_i = a_1^T x$ is a dot product of the initial array representation of the vector. This seems to me to not be a coincidence and that something deeper has to be going on. Usually dot products are related with projections so I was trying to understand if each coordinate of $(Ax)_i$ might actually be encoding how much the original $x$ is being projected into each row vector of $A$ (or possible something to do with the row space of $A$ i.e. $C(A^T)$ ). In an attempt to understand this I considered what each row means:

$$ \left[ a_{i,1} \dots a_{i,m} \right] \left[ \begin{array}{ccc} x_1\\ \vdots\\ x_n \\ \end{array} \right] = \sum^n_{j=1} a_{ij} x_j$$

in the old interpretation I had of what a column of a matrix is (this time the matrix is 1 by m), it seems that the columns $a_{i,j}$ specifies how much some basis vector $e_i$ is transformed. However, I've had difficulties understanding beyond that what the significance of the dot product of $x$ with the rows of $A$ means. Does someone know how to interpret this or how to understand it at a conceptual level, similar to the way the interpretation I gave of what the columns of a matrix mean? Are we doing some transformation to the row space of $A$ or something like that?

5

There are 5 best solutions below

2
On

Using the property of the transpose $\langle A^Tw,v\rangle = \langle w, Av\rangle$, I get:

$$\pmatrix{a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23}}\pmatrix{x \\ y \\ z} = \pmatrix{\langle A^T\pmatrix{1 \\ 0}, \pmatrix{x \\ y \\ z}\rangle \\ \langle A^T\pmatrix{0 \\ 1}, \pmatrix{x \\ y \\ z}\rangle} = \pmatrix{\langle \pmatrix{1 \\ 0}, A\pmatrix{x \\ y \\ z}\rangle \\ \langle \pmatrix{0 \\ 1}, A\pmatrix{x \\ y \\ z}\rangle} = \pmatrix{\operatorname{proj}_{e_1}(Ax) \\ \operatorname{proj}_{e_2}(Ax)}$$

That last bit should be absolutely clear -- The first component of $Ax$ is the projection of $Ax$ onto $e_1 = \pmatrix{1 \\ 0}$ and likewise for the second component.

0
On

We have $x = x_i e_i = x'_i e'_i$ where $e_i$ and $e'_i$ are bases related by nonsingular linear transformation. Note that ${e'}_i^T e'_j = g_{ij}$, where $g$ is invertible. Thus, ${e'}_i^T e_j x_j = {e'}_i^T e'_j x'_j = g_{ij}x'_j$ or $$x'_i = (g^{-1})_{ij}{e'}_j^T e_k x_k = (g^{-1})_{ij}{e'}_j^T x.$$ This gives us two good pieces of intuition. First, for a nonsingular linear transformation $A$ we can think of the elements of $A$ as being given by $$a_{ij} = (g^{-1})_{ik}{e'}_k^T e_j,$$ that is, by the dot product of a certain linear combination of the transformed basis vectors with the untransformed basis vectors. Second, to find the result of applying $A$ to $x$ we simply dot the same linear combination of the transformed basis vectors with the vector $x$.

For orthogonal transformations we find $g_{ij} = \delta_{ij}$ and so $$x'_i = {e'}_i^T e_j x_j = {e'}_i^T x \hspace{5ex}\textrm{and}\hspace{5ex} a_{ij} = {e'}_i^T e_j.$$

Note: We use Einstein's summation convention, $x = x^i e_i \equiv \sum_i x^i e_i$. For this problem the dual basis is $e^i = e_i^T$. The dual of $x$ is $x^T$, so $x_i e^i = x^i e_i^T$. We need not distinguish between $x_i$ and $x^i$ and so we write $x = x_i e_i$.

Example

Let $$A = \left(\begin{array}{cc}\cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{array}\right).$$ Then $$\left(\begin{array}{cc}\cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{array}\right) \left(\begin{array}{c}x \\ y\end{array}\right)$$ will give the components of $x$ in the new basis $e'_i$, where $[e_i]_j = \delta_{ij}$ is the standard basis. (This is a passive, rather than active, transformation.) It is straightforward to show that $e'_i = A^{-1}e_i = A^T e_i,$ so $$e'_1 = \left(\begin{array}{c}\cos\theta \\ \sin\theta\end{array}\right) \hspace{5ex}\textrm{and}\hspace{5ex} e'_2 = \left(\begin{array}{c}-\sin\theta \\ \cos\theta\end{array}\right).$$ One can then easily check that the elements of $A$ are given by $a_{ij} = {e'}_i^T e_j$. Note that, $$x'_1 = {e'}_1^T e_j x_j = \left(\begin{array}{cc}\cos\theta & \sin\theta\end{array}\right) \left(\begin{array}{c}x \\ y\end{array}\right)$$ and $$x'_2 = {e'}_2^T e_j x_j = \left(\begin{array}{cc}-\sin\theta & \cos\theta\end{array}\right) \left(\begin{array}{c}x \\ y\end{array}\right),$$ as expected.

0
On

It is easiest to see this in one dimension first.

Our goal is to show that any linear transformation $T: \mathbb{R}^n \rightarrow \mathbb{R}$ can be represented in the form $Tu = \beta^Tu$ for some $n$-dimensional vector $\beta$. Say that $u \in \mathbb{R}^n$; let $e_1, \ldots, e_n$ be the standard basis vectors for $\mathbb{R}^n$ (where $e_i$ has a $1$ in the $i$th position, and $0$'s elsewhere.) Then we can write $u = \sum_i u_i e_i$ where $u_i$ is the $i$-th coordinate of $u$. Since $T$ is linear, we have $$Tu =T \sum_i u_i e_i = \sum_i u_i Te_i = \sum_i u_i \beta_i,$$ where $\beta_i = Tb_i$. This means that with respect to the standard basis, $Tu = \beta \cdot u$ where $\beta$ is the vector $(\beta_1, \ldots, \beta_n)$. Thus every linear map from $\mathbb{R}^n$ to $\mathbb{R}$ can be represented by taking the dot product with a fixed vector.

Now for the multi-dimensional case: If $T: \mathbb{R}^n \rightarrow \mathbb{R}^m$ then $T$ is equivalent to the $m$-tuple of functions $(T_1, T_2, \ldots, T_m)$ where $T_jx$ is the $j$-th coordinate of $Tx$. Then for each $j$ there is a vector $\beta^j$ with $T_jx = \beta^j \cdot x$ and the result follows. (note that the $j$ in $\beta^j$ is just a superscript here meaning the $j$-th one, and doesn't have anything to with the $j$-th power.)

8
On

After reading your question further, I believe I misunderstood initially what you were asking.

In terms of orthogonality, what it tells you is that the row space is the orthogonal complement to the Nul space. Thus every vector can be written uniquely as the sum of a vector in the row space of A and in the Nul space of A. This is a fundamental direct product relationship: $$\mathbb{R}^n = \mathrm{Row}\left(\mathbf{A}\right) \ \oplus \mathrm{Nul}\left(\mathbf{A}\right)$$

That is, if you let $\hat{\mathbf y}$ be the projection of $\mathbf{y}$ onto $\mathrm{Nul}\left(\mathbf{A}\right)$, then $\mathbf{y} - \mathbf{\hat{y}} \in \mathrm{Row}(\mathbf{A})$

To see this, let $$ \mathbf{A} = \begin{bmatrix} \mathbf{a_1}^T \\ \mathbf{a_2}^T \\ \vdots \\ \mathbf{a_m}^T \end{bmatrix} $$

$ \\ $

Then

$$ \mathbf{Ax} = \mathbf{A} = \begin{bmatrix} \mathbf{a_1}^T \\ \mathbf{a_2}^T \\ \vdots \\ \mathbf{a_m}^T \end{bmatrix} * \mathbf{x} = \begin{bmatrix} \mathbf{a_1}^T \mathbf{x} \\ \mathbf{a_2}^T \mathbf{x}\\ \vdots \\ \mathbf{a_m}^T \mathbf{x} \end{bmatrix} $$

Therefore $\mathbf{Ax} = \mathbf{0}$ if and only if $\mathbf{x}$ is orthogonal to each row of $\mathbf{A}$, and hence to the entire row space of A. Ironically, I have a book that refers to this and one other similar statement as the Fundamental Theorem of Linear Algebra. I don't think most would agree but it is an important result.

0
On

I think an intuitive way to think about matrix multiplication is to regard it as a combination of coordinate extraction and scalar multiplication. First, given any basis for a finite vector space, such as $\mathbb{R}^n$, we usually express any vector $\mathbf{v}=\sum_i c_i \mathbf{e}_i=[c_1,c_2,\dots,c_n]$ as an $n$-tuple of scalars. Given any vector $\mathbf{v}$, the functions that extracts the $i$-th coordinate $c_i,$ are linear functionals, and form a basis for the dual space. They can be thought of similar to projection maps.

Second, given any scalar $c$, the operation of multipliying a vector $\mathbf{v}$ by $c$ to produce $c\mathbf{v}$ is a linear transformation to $\mathbb{R}^n$. The composition of extracting the $i$-th coordinate of a vector $\mathbf{v}$ and then multiplying the scalar by another vector $\mathbf{e}_j$ as a linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$ we denote by $T_{ij}.$ Now a matrix $\mathbf{A}$ with entries $a_{ij}$ is associated with the finite sum $\sum_{ij} a_{ij}T_{ij}$ as a linear transformation. Note that the identity map $I_n=\sum_i T_{ii}$ is the sum of projection maps $T_{ii}.$

We can think of this in two ways. First, the $i$-th row of the matrix $\mathbf{A}$ is associated with the composite map $\mathbf{v} \mapsto (\sum_j a_{ij} c_j)\mathbf{e}_i$ which is a dot product scalar multiplied by a basis vector. Second, the $j$-th column of the matrix $\mathbf{A}$ is associated with the composite map $\mathbf{v} \mapsto c_j(\sum_i a_{ij}\mathbf{e}_i)$ which is the scalar $c_j$, the $j$-th coordinate of $\mathbf{v}$, times the $j$-th column vector of $\mathbf{A}$. Your original understanding of matrix multiplication was pretty good.