Given a data set $X=(x_1,...,x_N)^T$ in which the observations $x_n \in R^D$ are assumed to be drawn independently from a multivariate Gaussian distribution with mean $\mu \in R^D$ and covariance matrix $\Sigma$ , the log likelihood is:
$ln (p(X| \mu,\Sigma)) = -\frac{ND}{2}ln(2\pi) -\frac{N}{2}ln|\Sigma| -\frac{1}{2} \sum_{n=1}^{N}(x_n - \mu)^T\Sigma^{-1}(x_n -\mu)$.
How to show that $\sum_{n=1}^{N}(x_n - \mu)^T\Sigma^{-1}(x_n -\mu)$ depends on $X$ only through the two quantities $\sum_{n=1}^{N}x_n$ and $\sum_{n=1}^{N}x_nx_n^T$ ?
C.M. Bishop's "Pattern Recognition and Machine Learning"(p.93) mentions "By simple rearrangement" . However, I fail to rearrange the term $\sum_{n=1}^{N}x_n^T\Sigma^{-1}x_n$ after expanding the contents of the sum.
Sufficient statistic is not $\sum_{n=1}^{N}x_n^\top x_n$, it's $\sum_{n=1}^{N}x_n x_n^\top$, a matrix which consists of all possible products of components $x_i x_j$. (This latter $x_i$ denote components of one $x$, not the $n$-th sampled vector.)
Now, $x^\top\Sigma^{-1}x$ consists of terms like $w_{ij} x_i x_j$, so it's a linear combination of components of $x x^\top$. After this, you can introduce summation from 1 to $N$ and make sure the combinations stay the same for each $n$.