Sample covariance matrix notation

489 Views Asked by At

I do not understand this notation for the sample covariance matrix (from Artificial Intelligence: A Modern Approach, Peter Norvig and Stuart J. Russell, Section 20.3, EM algorithm):

$\Sigma_{i} = \frac{\sum_{j}p_{ij}(\mathbf{x}_{j}-\mathbf{\mu}_{i})(\mathbf{x}_{j}-\mathbf{\mu}_{i})^{\top}}{n_{i}}$

As far as I know, matrix dimensions do not match. From what I understand, $\mathbf{x}_{j}$ and $\mathbf{\mu}_{i}$ are row vectors of dimension $1\times d$. How can this yield a $d\times d$ matrix? Isn't $\Sigma_{i}$ the covariance matrix of mixed Gaussian distribution component $i$? But, isn't $(\mathbf{x}_{j}-\mathbf{\mu}_{i})(\mathbf{x}_{j}-\mathbf{\mu}_{i})^{\top}$ a scalar?

I also looked up Wikipedia (https://en.wikipedia.org/wiki/Sample_mean_and_covariance). I understand this notation:

$q_{jk}=\frac{1}{N-1}\sum_{i=1}^{N}(x_{ij}-\overline{x_j})(x_{ik}-\overline{x_{k}})$

for elements of the sample covariance matrix (Q). But again, not this one:

$Q=\frac{1}{N-1}\sum_{i=1}^{N}(\mathbf{x}_{i.}-\overline{\mathbf{x}})(\mathbf{x}_{i.}-\overline{\mathbf{x}})^{\top}$

What am I missing here?

1

There are 1 best solutions below

0
On

Unless explicitly stated otherwise, vectors are generally assumed to be $d \times 1$ column vectors, not $1 \times d$ row vectors.

So, in both of the cases which you didn't understand (the first and the third cases), we have expressions like

\begin{equation*} (\mathbf{a} - \mathbf{b})(\mathbf{a} - \mathbf{b})^\top, \end{equation*}

where $\mathbf{a}$ and $\mathbf{b}$ (and therefore also "$(\mathbf{a} - \mathbf{b})$") are $d \times 1$ column vectors. Naturally, the expression $(\mathbf{a} - \mathbf{b})^\top$ then gives a $1 \times d$ row vector due to the transpose. Multiplying the two in that order then gives a $d \times d$ matrix.