Expectation of quartic form for multivariate Gaussian

1.4k Views Asked by At

I am reading the 2012 version The Matrix Cookbook. On Page 43, Section 8.2.4, “Mean of Quartic Forms”:


enter image description here


there is a formula that really confuses me:

$$E[x^TAxx^TBx]=Tr[A\Sigma(B+B^T)\Sigma]+m^T(A+A^T)\Sigma(B+B^T)m+(Tr(A\Sigma)+m^TAm)(Tr(B\Sigma)+m^TBm)$$

I do not check formulae of $E[xx^Txx^T]$, $E[xx^TAxx^T]$, or $E[x^Txx^Tx]$ in Cookbook, as they are the special cases of this one. Note that $x\sim N(m,\Sigma)$.

My question is how to prove it?


The following is my attempt:

Suppose consider the simpler situation: a quadratic form with $A$, $B$ symmetric. I use the same technique appeared on Page 11 of Seber and Lee's book "Linear Regression Analysis" enter image description hereto re-organize these two quadratic forms, then

$E[x^TAxx^TBx]=E\Big\{[(x-m)^TA(x-m)+2m^TA(x-m)+m^TAm)] *[(x-m)^TB(x-m)+2m^TB(x-m)+m^TBm)]\Big\}$

$=E\Big\{(x-m)^TA(x-m)(x-m)^TB(x-m)\Big\}+E\Big\{(x-m)^TA(x-m)m^TBm\Big\}+E\Big\{m^TAm(x-m)^TB(x-m)\Big\}+E\Big\{m^TAmm^TBm\Big\}+2E\Big\{m^TA(x-m)m^TBm\Big\}+2E\Big\{m^TAmm^TB(x-m)\Big\}+2E\Big\{(x-m)^TA(x-m)m^TB(x-m)\Big\}+2E\Big\{m^TA(x-m)(x-m)^TB(x-m)\Big\}+4E\Big\{m^TA(x-m)m^TB(x-m)\Big\}$

in which the fifth and sixth items are zeros and the first four items can be further rearranged such that they equate $(Tr(A\Sigma)+m^TAm)(Tr(B\Sigma)+m^TBm)$ by the fact $E\Big\{(x-m)^TQ(x-m)\Big\}=Tr(Q\Sigma)$

The last item can be reduced if we notice $(scalar)^T=scalar$:

$E\Big\{m^TA(x-m)\cdot m^TB(x-m)\Big\}=E\Big\{m^TA(x-m)\cdot (x-m)^TBm\Big\}=m^TAE\Big\{(x-m)\cdot (x-m)^T\Big\}Bm=m^TA\Sigma Bm$

Consequently, (This is my detailed question, how to show it?) it suffices to show that the seventh and eighth items are equal to the first item in RHS of the initial formula, that is, $2E\Big\{(x-m)^TA(x-m)m^TB(x-m)\Big\}+2E\Big\{m^TA(x-m)(x-m)^TB(x-m)\Big\}=Tr[A\Sigma(B+B^T)\Sigma]=2Tr(A\Sigma B\Sigma)$

I searched it in this website, the link is useful: Expected value using the Kronecker Product

The RHS of my above equation, by the formula $Tr(ABCD)=vec(B^T)^T(A^T\otimes C)vec(D)$ can be re-written as:

$Tr(A\Sigma B\Sigma)=vec(\Sigma)^T(A\otimes B)vec(\Sigma)$

The LHS, by the link, by the fact $x\otimes y=vec(yx^T)$ for two column vectors $x$ and $y$, and by $scalar=Tr(scalar)$, are equivalent to:

$E\Big\{(x-m)^TA(x-m)\cdot m^TB(x-m)\Big\}+E\Big\{m^TA(x-m)\cdot (x-m)^TB(x-m)\Big\}=E\Big\{\Big((x-m)^TA(x-m)\Big)\otimes\Big(m^TB(x-m)\Big)\Big\}+E\Big\{\Big(m^TA(x-m)\Big)\otimes\Big((x-m)^TB(x-m)\Big)\Big\}=E\Big\{\Big((x-m)^T\otimes m^T\Big)\Big(A\otimes B\Big)\Big((x-m)\otimes(x-m)\Big)\Big\}+E\Big\{\Big(m^T\otimes (x-m)^T\Big)\Big(A\otimes B\Big)\Big((x-m)\otimes(x-m)\Big)\Big\}=E\Big\{vec\Big((x-m)m^T+m(x-m)^T\Big)^T\Big(A\otimes B\Big)vec\Big((x-m)(x-m)^T\Big)\Big\}$

In normal distribution assumption, $vec(E[(x-m)(x-m)^T])=vec(\Sigma)$. Therefore it seems to be really near the final equating relation (it ought to be):

$vec(\Sigma)^T(A\otimes B)vec(\Sigma)=E\Big\{vec\Big((x-m)m^T+m(x-m)^T\Big)^T\Big(A\otimes B\Big)vec\Big((x-m)(x-m)^T\Big)\Big\}$

Who can help me prove the above equation? Or are there any mistakes in derivations before?

=============================

Here is my second trial:

Inspired by Seber and Lee's book again, on the same page 11,enter image description here who can help me expand $E[x^TAxx^TBx]$ into such coordinate forms to get same solution? though I estimate the working time is huge.

A natural question is, based on my first attempt, how to coordinately expand

$2E\Big\{(x-m)^TA(x-m)m^TB(x-m)\Big\}+2E\Big\{m^TA(x-m)(x-m)^TB(x-m)\Big\}=Tr[A\Sigma(B+B^T)\Sigma]=2Tr(A\Sigma B\Sigma)$

from one side then simplified to the other side? It's relatively easy than expanding $E[x^TAxx^TBx]$ for sure, but how to expand $Tr(A\Sigma B\Sigma)$? There are 4 matrices rather than vectors.

==============================

My last question: are there any books which not only collect matrix formulae but provide proofs and solutions as well? I need your recommendation!

Honestly thank you in advance!

2

There are 2 best solutions below

0
On BEST ANSWER

Suppose that $A, B$ are real matrices (not necessarily symmetric).

We may express $x = m + \Sigma^{1/2}y$ where $y\sim \mathcal{N}(0, I)$.

For convenience, denote $C = \Sigma^{1/2}$. Note that $C$ is real symmetric.

Using $4ab = (a + b)^2 - (a - b)^2$, we have \begin{align*} \mathbb{E}[x^\mathsf{T}Ax x^\mathsf{T}Bx] &= \frac{1}{4}\,\mathbb{E}[(x^\mathsf{T}(A + B)x)^2] - \frac{1}{4}\,\mathbb{E}[(x^\mathsf{T}(A - B)x)^2]\\ &= \frac{1}{4}\,\mathbb{E}[(x^\mathsf{T}Px)^2] - \frac{1}{4}\,\mathbb{E}[(x^\mathsf{T}Qx)^2] \end{align*} where $P = \frac{A + A^\mathrm{T}}{2} + \frac{B + B^\mathsf{T}}{2}$ and $Q = \frac{A + A^\mathrm{T}}{2} - \frac{B + B^\mathsf{T}}{2}$, and we have used $x^\mathsf{T}Ax = x^\mathsf{T}\frac{A + A^\mathsf{T}}{2}x$.
Note: $P, Q$ are both symmetric.

Fact 1: Let $y \sim \mathcal{N}(0, I)$. Let $S$ be a given real symmetric matrix. Then:
(1) $\mathbb{E}[(y^\mathsf{T}Sy)^2] = 2\mathrm{tr}(S^2) + [\mathrm{tr}(S)]^2$;
(2) $\mathbb{E}[(y^\mathsf{T}Sy)y] = 0$.
The proof is not difficult. I gave a proof of (1) before. See: Expectation of product of quadratic forms in Gaussian distribution

Using Fact 1, we have \begin{align*} &\mathbb{E}[(x^\mathsf{T}Px)^2]\\ =\, & \mathbb{E}[((m + Cy)^\mathsf{T}P(m + Cy))^2]\\ =\,& \mathbb{E}[(y^\mathsf{T}C^\mathsf{T}PCy + 2m^\mathsf{T}PCy + m^\mathsf{T}Pm)^2]\\ =\, & \mathbb{E}[(y^\mathsf{T}C^\mathsf{T}PCy)^2] + \mathbb{E}[(2m^\mathsf{T}PCy)^2] + (m^\mathsf{T}Pm)^2 + \mathbb{E}[4(m^\mathsf{T}PCy) m^\mathsf{T}Pm]\\ &\quad + \mathbb{E}[4(y^\mathsf{T}C^\mathsf{T}PCy)m^\mathsf{T}PCy] + \mathbb{E}[2(y^\mathsf{T}C^\mathsf{T}PCy)m^\mathsf{T}Pm] \tag{1}\\ =\,& 2\mathrm{tr}[(C^\mathsf{T}PC)^2] + [\mathrm{tr}(C^\mathsf{T}PC)]^2 + 4m^\mathsf{T}PC C^\mathsf{T}Pm + (m^\mathsf{T}Pm)^2\\ &\quad + 2m^\mathsf{T}Pm\,\mathrm{tr}[C^\mathsf{T}PC] \\ =\,& 2\mathrm{tr}[P\Sigma P \Sigma] + [\mathrm{tr}(P\Sigma)]^2 + 4m^\mathsf{T}P\Sigma P m + (m^\mathsf{T}Pm)^2 + 2m^\mathsf{T}Pm\,\mathrm{tr}[P\Sigma] \end{align*} where in (1) we have used $y^\mathsf{T}C^\mathsf{T}PCy = \mathrm{tr}[C^\mathsf{T}PCyy^\mathsf{T}]$, and $\mathbb{E}[yy^\mathsf{T}] = I$.

Thus, we have \begin{align*} &\mathbb{E}[x^\mathsf{T}Ax x^\mathsf{T}Bx]\\ =\,& \frac12 \mathrm{tr}[P\Sigma P \Sigma] - \frac12 \mathrm{tr}[Q\Sigma Q \Sigma] + \frac14 [\mathrm{tr}(P\Sigma)]^2 - \frac14 [\mathrm{tr}(Q\Sigma)]^2\\ &\quad + m^\mathsf{T}P\Sigma P^\mathsf{T}m - m^\mathsf{T}Q\Sigma Q^\mathsf{T}m + \frac14 (m^\mathsf{T}Pm)^2 - \frac14 (m^\mathsf{T}Qm)^2\\ &\quad + \frac12 m^\mathsf{T}Pm\,\mathrm{tr}[P\Sigma] - \frac12 m^\mathsf{T}Qm\,\mathrm{tr}[Q\Sigma]\\ =\,& \frac12\mathrm{tr}[(A + A^\mathsf{T})\Sigma (B + B^\mathsf{T})\Sigma] + \frac14\, \mathrm{tr}[(A + A^\mathsf{T})\Sigma]\mathrm{tr}[(B + B^\mathsf{T})\Sigma]\\ &\quad + m^\mathsf{T}(A + A^\mathsf{T})\Sigma (B + B^\mathsf{T})m + \frac14\, m^\mathsf{T}(A + A^\mathsf{T})m m^\mathsf{T}(B + B^\mathsf{T})m\\ &\quad + \frac14\, m^\mathsf{T}(A + A^\mathsf{T})m \, \mathrm{tr}[(B + B^\mathsf{T})\Sigma] + \frac14\, m^\mathsf{T}(B + B^\mathsf{T})m \, \mathrm{tr}[(A + A^\mathsf{T})\Sigma]\\ =\,& \mathrm{tr}[A\Sigma (B + B^\mathsf{T})\Sigma] + m^\mathsf{T}(A + A^\mathsf{T})\Sigma (B + B^\mathsf{T})m\\ &\quad + (\mathrm{tr}[A\Sigma] + m^\mathsf{T}Am)(\mathrm{tr}[B\Sigma] + m^\mathsf{T}Bm). \end{align*}

We are done.

0
On

Remark: In my second answer, we evaluate the expectation of product of three quadratic forms.

Problem: Let $x\sim \mathcal{N}(m, \Sigma)$. Let $A, B, C$ be given real symmetric matrices. Find $\mathbb{E}[x^\mathsf{T}Ax x^\mathsf{T}Bx x^\mathsf{T}Cx]$.

Using $$abc = \frac{1}{24}[(a + b + c)^3 - (a + b - c)^3 - (a - b + c)^3 - ( - a + b + c)^3],$$ we have \begin{align*} &\mathbb{E}[x^\mathsf{T}Ax x^\mathsf{T}Bx x^\mathsf{T}Cx]\\[5pt] =\,& \frac{1}{24}\,\mathbb{E}[(x^\mathsf{T}Px)^3] - \frac{1}{24}\,\mathbb{E}[(x^\mathsf{T}Qx)^3] - \frac{1}{24}\,\mathbb{E}[(x^\mathsf{T}Rx)^3] - \frac{1}{24}\,\mathbb{E}[(x^\mathsf{T}Sx)^3] \end{align*} where $P = A + B + C, ~Q = A + B - C, ~R = A - B + C, ~S = -A + B + C$.

We may express $x = m + \Sigma^{1/2}y$ where $y\sim \mathcal{N}(0, I)$. For convenience, denote $C = \Sigma^{1/2}$. Since $\Sigma$ is symmetric, $C$ is also symmetric.

Fact 1: Let $y \sim \mathcal{N}(0, I)$. Let $M$ be a given real symmetric matrix. Then:
(1) $\mathbb{E}[(y^\mathsf{T}My)^2] = 2\mathrm{tr}(M^2) + [\mathrm{tr}(M)]^2$;
(2) $\mathbb{E}[(y^\mathsf{T}My)^3] = 8\mathrm{tr}[M^3] + 6\mathrm{tr}[M^2]\mathrm{tr}[M] + [\mathrm{tr}(M)]^3$;
(3) $\mathbb{E}[(y^\mathsf{T}My)y ] = 0$;
(4) $\mathbb{E}[(y^\mathsf{T}My)^2y ] = 0$.
(The proof is given at the end.)

Using Fact 1, we have \begin{align*} &\mathbb{E}[(x^\mathsf{T}Px)^3]\\ =\,& \mathbb{E}[((m + Cy)^\mathsf{T}P(m + Cy))^3]\\ =\,& \mathbb{E}[(y^\mathsf{T}C^\mathsf{T}PCy + 2m^\mathsf{T}PCy + m^\mathsf{T}Pm)^3]\\ =\,& \mathbb{E}[(y^\mathsf{T}C^\mathsf{T}PCy + m^\mathsf{T}Pm)^3] + \mathbb{E}[(2m^\mathsf{T}PCy)^3]\\ &\quad + \mathbb{E}[3(y^\mathsf{T}C^\mathsf{T}PCy + m^\mathsf{T}Pm)^2\cdot 2m^\mathsf{T}PCy]\\ &\quad + \mathbb{E}[3(y^\mathsf{T}C^\mathsf{T}PCy + m^\mathsf{T}Pm)\cdot(2m^\mathsf{T}PCy)^2]\\ =\, & \mathbb{E}[(y^\mathsf{T}C^\mathsf{T}PCy + m^\mathsf{T}Pm)^3] + \mathbb{E}[3(y^\mathsf{T}C^\mathsf{T}PCy + m^\mathsf{T}Pm)\cdot(2m^\mathsf{T}PCy)^2]\\ =\,& \mathbb{E}[(y^\mathsf{T}C^\mathsf{T}PCy)^3] + (m^\mathsf{T}Pm)^3 + \mathbb{E}[3(y^\mathsf{T}C^\mathsf{T}PCy)^2m^\mathsf{T}Pm]\\ &\quad + \mathbb{E}[3(y^\mathsf{T}C^\mathsf{T}PCy)(m^\mathsf{T}Pm)^2] + \mathbb{E}[12 y^\mathsf{T}C^\mathsf{T}PCy \cdot y^\mathsf{T}C^\mathsf{T}Pmm^\mathsf{T}PCy]\\ &\quad + \mathbb{E}[12 m^\mathsf{T}Pm\cdot y^\mathsf{T}C^\mathsf{T}Pmm^\mathsf{T}PCy]. \end{align*} The left is smooth.


Proof of Fact 1: Let $M = U\mathrm{diag}(d_1,d_2, \cdots, d_n)U^T$ be the eigendecomposition. Let $z = U^Ty$. Then $z$ is a random vector with i.i.d. entries distributed as $\mathcal{N}(0, 1)$. Note that $\mathbb{E}[z_i^6] = 15$, $\mathbb{E}[z_i^4] = 3$ and $\mathbb{E}[z_i^2]=1$. We have \begin{align} \mathbb{E}[(y^TMy)^3] &= \mathbb{E}[(z^T\mathrm{diag}(d_1,d_2, \cdots, d_n)z)^3]\\ &= 6\sum d_i^3 + 9(\sum d_i)(\sum d_i^2) + 6\sum_{i < j < k} d_i d_j d_k\\ &= 8\mathrm{tr}[M^3] + 6\mathrm{tr}[M^2]\mathrm{tr}[M] + [\mathrm{tr}(M)]^3 \end{align} where we have used the Newton identity $$\sum_{i < j < k} d_i d_j d_k = \frac{1}{6}(\sum d_i)^3 - \frac{1}{2}(\sum d_i)(\sum d_i^2) + \frac{1}{3}\sum d_i^3.$$