I want to have an intuition for why AB in matrix multiplication is not same as BA. It's clear from definition that they are not and there are arguments here (Fast(est) and intuitive ways to look at matrix multiplication?) that explain that this is in order to maintain certain compositional properties. Example:
(column vector)
$A = \left( \begin{array}{c} 1\\ 2\\ 3 \end{array} \right)$
(row vector)
$B = \left(1, 5, 0\right)$
If we view a matrix as a linear combination of columns, then I read this from right to left as saying "take 1 times" of column 1 of left matrix, then "take 5 times" of column 2 of left matrix, then "take 0 times" of column 3 of left matrix. Intuitively this means the $B$ vector is the set of weights for the linear combination and the columns of $A$ are the ones being combined. This yields:
$AB = \left( \begin{array}{c} 1 & 5 & 0\\ 2 & 10 & 0\\ 3 & 15 & 0 \end{array} \right)$
1st question: is this a valid way to think of the operation? It gives the right answer here, but more generally is it correct?
2nd question: how can we apply this (or a better) intuition to the case of mutiplying $BA$? We have:
$BA = \left((1\times1) + (5\times2) + (3\times0)\right) = \left(11\right)$
not sure how to think of that intuitively.
one intuition that has been proposed is matrix multiplication as linear composition of functions. I'm open to that but usually I don't think of matrices like $A$ and $B$ as individually representing functions.
Regarding your first question, if you write $$ A=\begin{bmatrix}\mathbf r_1\\\mathbf r_2\\\vdots\\\mathbf r_m\end{bmatrix}, \ \ B=[\mathbf c_1\ \mathbf c_2\ \cdots\ \mathbf c_n], $$ then $$ AB=\begin{bmatrix} \mathbf r_1\cdot\mathbf c_1&\mathbf r_1\cdot\mathbf c_2&\cdots&\mathbf r_1\cdot\mathbf c_n\\\mathbf r_2\cdot\mathbf c_1&\mathbf r_2\cdot\mathbf c_2&\cdots&\mathbf r_2\cdot\mathbf c_n\\ \vdots&&\cdots&\vdots\\ \mathbf r_m\cdot\mathbf c_1&\mathbf r_m\cdot\mathbf c_2&\cdots&\mathbf r_m\cdot\mathbf c_n\\ \end{bmatrix} $$ so the idea works in general. You can also write this as $$ AB=\begin{bmatrix}\mathbf r_1B\\\mathbf r_2B\\\vdots\\\mathbf r_mB\end{bmatrix}=[\mathbf Ac_1\ \mathbf Ac_2\ \cdots\ \mathbf Ac_n] $$
In the second case, for BA, what happens is that you are reduced to the "minimal" case, which is when there is a single dot product to be calculated.