Mean squared error of a sample covariance matrix

2.3k Views Asked by At

I am trying prove a claim from some paper that the mean squared error of the sample covariance estimator $\bf{S}=\frac{1}{n}\bf{X}\bf{X}^T$ of the covariance matrix $\Sigma$ is $\mathrm{MSE}(\bf{S})=\frac{1}{n}[\mathrm{tr}(\Sigma^2)+\mathrm{tr}^2(\Sigma)]$. Here $\bf{X}$ are $p\times{}n$ matrices of samples drawn from the multivariate normal distribution $N_p(0,\Sigma)$. $\bf{S}$ and $\Sigma$ are symmetric matrices.

The MSE is defined as $\mathrm{MSE}(\bf{S})= \mathbb{E}(\left\lVert \Sigma{}-S \right\rVert^2)$, where $\mathbb{E}()$ is the expectation value and $\left\lVert \bf{A} \right\rVert=\mathrm{tr}(\bf{A}^T\bf{A})^{1/2}$ is the Frobenius norm.

Here is my attempt to solving this: \begin{align}\mathrm{MSE}\left(\bf{S}\right) =& \mathbb{E}(\left\lVert \Sigma{}-\bf{S} \right\rVert^2) \\ =& \mathbb{E}(\mathrm{tr}[(\Sigma{}-\bf{S})^T(\Sigma{}-S)]) \\ =& \mathbb{E}(\mathrm{tr}[\Sigma^2-\Sigma{}\bf{S}-\bf{S}\Sigma{}+\bf{S}^2]) \\ =& \mathrm{tr}[\Sigma^2]-2\mathbb{E}(\mathrm{tr}[\bf{S}\Sigma])+\mathbb{E}(\mathrm{tr}[\bf{S}^2])\\ =& \mathrm{tr}[\Sigma^2]-2 \mathrm{tr}[\mathbb{E}(\bf{S})\Sigma]+\mathrm{tr}[\mathbb{E}(\bf{S}^2)] \end{align}

Now the problem has reduced to computing $\mathbb{E}(\bf{S})$ and $\mathbb{E}(\bf{S}^2)$. My initial thoughts were that $\mathbb{E}(\bf{S})=\Sigma$ and $\mathbb{E}(\bf{S}^2)=\Sigma^2$, but then the whole expression above is $0$...

Could someone help by pointing out what am I doing wrong?

Update: I found some problems with my reasoning. The expectation of $\bf{XX}^T$ is not $\Sigma$. Before going for the expectations, I am still struggling with computing the two necessary traces of the following type: $$\mathrm{tr}(\Sigma{}\bf{XX}^T)=\mathrm{tr}(\Sigma{}\Sigma{}^{1/2}\bf{YY}^T\Sigma{}^{1/2~T})$$ and $$\mathrm{tr}(\bf{XX}^T\bf{XX}^T)=\mathrm{tr}(\Sigma{}^{1/2}\bf{YY}^T\Sigma{}^{1/2~T}\Sigma{}^{1/2}\bf{YY}^T\Sigma{}^{1/2~T}),$$ where $\bf{Y}$ are samples from standard normal distributions and $\Sigma{}^{1/2}$ is a lower triangular matrix as obtained by Cholesky decomposition.

1

There are 1 best solutions below

0
On BEST ANSWER

After some fiddling I figured out the way to solve this, but I did not finish it and am fine leaving it like this.

The final step towards the solution was to express $\Sigma{}=Q\Lambda{}Q^T$ and $\Sigma{}^{1/2}=Q\Lambda^{1/2}$ using the eigendecomposition. Here $Q$, is an orthogonal matrix and $\Lambda{}$ is the diagonal eigenvalue matrix.

Now everything falls into place, which I will demostrate by computing the $\Sigma{}\bf{XX}^T$ term:

\begin{align}\mathrm{tr}[\Sigma{}\bf{XX}^T]= & \mathrm{tr}[\Sigma{}\Sigma{}^{1/2}\bf{YY}^T\Sigma{}^{1/2~T}] \\ =& \mathrm{tr}[Q\Lambda{}Q^{T}Q\Lambda^{1/2}\bf{YY}^T\Lambda^{1/2}Q^T] \\ =& \mathrm{tr}[\bf{Y}^T\Lambda{}^{2}\bf{Y}]=\sum_{i}\lambda_{i}^2y_i^2. \end{align}

Finally, one can get the expectation of this and it is $\mathrm{tr}[\Sigma^2]$.

Something similar can be done with the $\bf{XX}^T\bf{XX}^T$ term, but the math got tedious when multiplying 6 matrices. Funny saying this on this site, but I got what I needed out of it.

Update: Looked into some books on random matrices and found a couple of relations that solve the problem in one line. More can be found in the book by Gupta and Nagar, Matrix Variate Distributions, 2000.

The first one is $$\mathbb{E}(\mathbf{XAX}^T)=\mathrm{tr}[\mathbf{A}]\Sigma,$$ where $\mathbf{A}$ is a constant $p\times{}p$ matrix. The second one is $$\mathbb{E}(\mathbf{XAX^TBXCX^T})=\mathrm{tr}[\mathbf{C^TA^T}]\mathrm{tr}[\mathbf{B}\Sigma]\Sigma + \mathrm{tr}[\mathbf{A}]\mathrm{tr}[\mathbf{C}]\Sigma{}\mathbf{B}\Sigma{} + \mathrm{tr}[\mathbf{AC}^T]\Sigma{}\mathbf{B}^T\Sigma{},$$ where $\mathbf{A}$, $\mathbf{B}$ and $\mathbf{C}$ are $n\times{}n$, $p\times{}p$ and $n\times{}n$ matrices, respectively.

I am leaving the derivation above though, because it kind of shows how these relations were derived.