Simple(r) way to derive the expectation of an inverse Wishart?

685 Views Asked by At

I am looking for a simple way to derive the expectation of an inverse Wishart matrix. , with distribution $W^{-1}$ where $W=\sum_{i=1}^n \Sigma^{1/2} g_i g_i^T \Sigma^{1/2}$ for a covariance $\Sigma\in R^{d\times d}$ and iid standard vectors $g_i\sim N(0,I_d)$. The covariance $\Sigma$ is assumed invertible.

I will give a complete argument below, but the argument is too complicated for my liking, especially to teach to late undergraduate students.

The first observation is that $$ E[W^{-1}] = \Sigma^{-1/2} E\left[\left(\sum_{i=1}^n g_i g_i^T\right)^{-1}\right] \Sigma^{-1/2} $$ so that it is enough to treat the case $\Sigma=I_p$. Assume $\Sigma= I_p$ here after.

Another easy result is for the non-diagonal terms of $E[W^{-1}]$. With identity covariance, $\sum_i g_i g_i^T$ and $\sum_i \tilde g_i \tilde g_i^T$ with $\tilde g_i = D g_i$ have the same distribution where $D=\operatorname{diag}(1,...,1,-1,1,..1)$ (only one sign change). This implies that $$E[W^{-1}] = D^{-1} E[W^{-1}] D^{-1}$$ so that $E[W^{-1}]_{ij}=0$ for $i\ne j$ (outside of the diagonal).

It remains to compute the diagonal elements, and this is the step that is too complex to my liking. By symmetry, $$E[W^{-1}]_{ii} = \frac 1 d E[\operatorname{trace}[W^{-1}].$$ The trace is also the sum of the eigenvalues $\lambda_i(W^{-1})$ of $W^{-1}$, or the following Frobenius norm: $$ E[\operatorname{trace}[W^{-1}] = E\sum_{i=1}^d \lambda_i(W)^{-1} = E[\|G^\dagger\|_F^2] $$ where $G\in R^{n\times d}$ is the matrix with $n$ rows $g_1,...,g_n$, and $\dagger$ denotes the pseudo-inverse. At this point, if $c_1,...,c_d$ are the columns of $G^\dagger$, the above display is $\sum_{j=1}^d \|c_j\|_2^2$. Furthermore by definition of the pseudo-inverse, with $z_1,...,z_d$ the rows of $G$, we have $c_j^Tz_j=1$ and $c_j^T z_k0$ for $j\ne k$. This implies that $c_j$ belongs to the orthogonal complement of $\{z_k, k\in\{1,...,d\}\setminus j\}$. Since $c_j$ belongs to the span of $z_1,...,z_d$, it must be that $c_j = \theta_j Q_j z_j$ with $Q_j\in R^{n\times n}$ the orthogonal projection onto $\{z_k, k\in\{1,...,d\}\setminus j\}^\perp$ and $\theta_j$ a scalar. The condition $c_j^Tz_j=1$ then reveals $\theta_j = \|Q_jz_j\|_2^{-2}$. Finally, $\|Q_jz_j\|_2^2$ has $\chi^2_{n-d+1}$ distribution as $Q_j$ and $z_j$ are independent thanks to $G$ having iid $N(0,1)$ entries, hence $$ E[\operatorname{trace}[W^{-1}] ] = E\sum_{j=1}^d \|c_j\|_2^2 = E\sum_{j=1}^d \|Q_j z_j\|_2^{-2} = \frac{d}{(n-d+1) -2} = \frac{d}{n-d-1} $$ provided that we already know that the expectation of an inverse $\chi^2_\nu$ distribution has expectaiton $1/(\nu-2)$ for $\nu>2$.

Remark: The trick to express the trace as the sum of the squared norms of the columns of $G^\dagger$ is the "negative second moment identity" in Lemma A.4 of Tao, Terence; Vu, Van; Krishnapur, Manjunath. Random matrices: Universality of ESDs and the circular law. Ann. Probab. 38 (2010), no. 5, 2023--2065. doi:10.1214/10-AOP534. https://projecteuclid.org/euclid.aop/1282053780