So in my book, it is written:
Let $X_1,X_2,...,X_n$ have a multivariate normal distribution with mean $\mu$ and covariance matrix $K$ and $\textbf{X}=(X_1,X_2,...,X_n)$
The above isn't really relevant to my question all that needs to be known is that we're multiplying matrices and we switch the terms of the multiplication.
$A=(\textbf{x}-\mu )^{T}K^{-1}(\textbf{x}- \mu)$ $\space \space \space \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space $$(1)$
$A$$=\sum_{i,j}(X_i-\mu_i)(K^{-1}_{ij})(X_j-\mu_j)$ $\space \space \space \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space $$(2)$
$A$$=\sum_{i,j}(X_i-\mu_i)(X_j-\mu_j)(K^{-1}_{ij})$$\space \space \space \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space$$(3)$
I don't understand how we went from $(2)$ to $(3)$. Last time I checked, when we multiplied two matrices the number of columns of the left term had to be equal to the number of rows of the right term. This rule doesn't seem to be respected when going from $(2)$ to $(3)$ how is this possible?
Thanks for the help! This is from the book by Cover and Thomas on Information theory. It corresponds to equations $(8.37)$ and $(8.38)$ in the book.
When you go from 1 to 2 you sort of leave the matrix world. Note the subscripts on $K$. It's the element in row $i$ column $j$ of the inverted covariance matrix. So $K_{ij}$ is simply a scalar.