In Linear Algebra Done Right, it presents several ways to look at matrix multiplication which I don't understand.
Suppose $v_1,...,v_n$ is a basis of $V$, $w_1,...,w_m$ is a basis of $W$, $u_1,...,u_p$ is a basis of $U$.
Suppose $T: U \to V$ , $S:V \to W$ and $M(S) = A$, $M(T) = C$. For $1 \leq K \leq p$, we have
\begin{equation} \begin{split} (ST)u_k &= S(\sum_{r=1}^{n}C_{r,k}v_r) \text{ This is Matrix times column?}\\ &= \sum_{r=1}^{n}C_{r,k}Sv_r\\ &= \sum_{r=1}^{n}C_{r,k} \sum_{j=1}^{m}A_{j,r}w_j \text{ This is linear combination of columns?} \\ &= \sum_{j=1}^{m}\sum_{r=1}^{n}(A_{j,r}C_{r,k})w_j \text{ I don't know how to get from the previous step to this step} \end{split} \end{equation}
It's important that you understand how $M(S)$ and $M(T)$ are defined. Also note that your notation doesn't account for the underlying bases; these are not standard matrices!
For example, $M(T)$ is determined by the rule that, its $k$th column is the coordinate column vector of $Tu_k$, with respect to the basis $v_1, \ldots, w_n$. That is, $$Tu_k = \sum_{r=1}^n C_{r, k} v_r;$$ this is just the standard process for recovering $Tu_k$ from its coordinate vector in the $k$th column of $C$.
It's not really just "matrix times a column", since $v_r$ may not be a column vector. It's an element of $V$, which may or may not be equal to $\Bbb{R}^n$. It might be that $v_r$ is a polynomial, or some other abstract vector. As I said, this is just recovering a vector from its coordinate vector.
The same goes for the next point. We have $$Sv_r = \sum_{j=1}^m A_{j,r} w_j$$ by exactly the same reasoning.
The final steps are the distributive law to pull a constant into a sum: $$\sum_{r=1}^{n}C_{r,k} \sum_{j=1}^{m}A_{j,r}w_j = \sum_{r=1}^{n} \sum_{j=1}^{m}C_{r,k}A_{j,r}w_j,$$ then rearranging sum order using associativity and commutativity, $$\sum_{r=1}^{n} \sum_{j=1}^{m}C_{r,k}A_{j,r}w_j = \sum_{j=1}^{m} \sum_{r=1}^{n}C_{r,k}A_{j,r}w_j,$$ and finally using commutativity of the scalar field to change the order of $C_{r, k}$ and $A_{j, r}$.