Derivation help for $E[x^{\top}Ax] = tr(AP)$: missing the trace operator, using the moment generating function for Gaussian $x$ and gradient

89 Views Asked by At

Following notations are used:

$x$: vector-valued random variable $x$ (with its mean $\bar{x}$ = 0).

$P$: covariance of $x$

$tr(\cdot)$: trace operator

$E[\cdot]$: expectation

$M_x(s)$: Moment generating function of $x$

$\nabla_s$: gradient with respect to $s$

This is the statement from my reading:

"~ for a Guassian random vector $x$, the identity $E[x^{\top}Ax] = tr(AP)$ can be derived using the moment generating function $M_x(s) = E[e^{s^{\top}x}] = e^{\frac{1}{2}s^{\top}Ps + s^{\top}\bar{x}}$ and the gradient operator $\nabla_s$".

I follow the partially shown derivation: \begin{align} E[x^{\top}Ax] &= E[(\nabla_s e^{s^{\top}x})^{\top}Ax]|_{s = 0} \qquad \text{(since $x = \nabla_s e^{s^{\top}x}|_{s=0}$)} \\ &= E[\nabla^{\top}_sAx e^{s^{\top}x}]|_{s=0} \quad\qquad \text{(since $e^{s^{\top}x} $ is a scalar)} \\ &= \nabla^{\top}_sAE[xe^{s^{\top}x}]|_{s=0} \quad\qquad \text{(A is constant and $E[\cdot]$ and $\nabla_s$ are linear)} \\ &= \nabla^{\top}_sAE[\nabla_s e^{s^{\top}x}]|_{s=0} \quad\qquad \text{($xe^{s^{\top}x} = \nabla_s e^{s^{\top}x}$)} \\ &= \nabla^{\top}_sA\nabla_sE[ e^{s^{\top}x}]|_{s=0} \quad\qquad \text{($E[\cdot]$ and $\nabla_s$ linear)} \\ &= \nabla^{\top}_sA\nabla_sM_x(s)|_{s=0} \quad\qquad \text{(definition of $M_x(s)$)} \\ \end{align}

But when evaluating the gradient from the last line, I run into a problem of missing the trace operator somewhere below:

\begin{align} E[x^{\top}Ax] &= \nabla^{\top}_sA\nabla_sM_x(s)|_{s=0} \quad\qquad \\ &= \nabla^{\top}_s A \nabla_s \left(e^{\frac{1}{2}s^{\top}Ps} \right)|_{s=0}\quad\qquad \qquad\text{(since $\bar{x} = 0$ )} \\ &= \nabla^{\top}_s A(Ps)e^{\frac{1}{2}s^{\top}Ps}|_{s=0}\qquad\qquad\qquad \text{(chain rule and $P$ symmetric)} \\ &=?\quad A\left[Pe^{\frac{1}{2}s^{\top}Ps} + (Ps)(Ps)^{\top} e^{\frac{1}{2}s^{\top}Ps}\right]|_{s=0}\quad\qquad \text{(product rule)} \\ &=?\quad AP \qquad\qquad\qquad\qquad\qquad\qquad\qquad \text{(evaluating at $s = 0$)} \\ \end{align}

I am wrong but don't know where I am wrong. I suspect the second from the last line. If so, how? Where should a trace operator pop up?


Based on the answer, a complete derivation would be:

\begin{align} E[x^{\top}Ax] &= \nabla^{\top}_sA\nabla_sM_x(s)|_{s=0} \quad\qquad \\ &= \nabla^{\top}_s A \nabla_s \left(e^{\frac{1}{2}s^{\top}Ps} \right)|_{s=0}\quad\qquad \qquad\text{(since $\bar{x} = 0$ )} \\ &= \nabla^{\top}_s A(Ps)e^{\frac{1}{2}s^{\top}Ps}|_{s=0}\qquad\qquad\qquad \text{(chain rule and $P$ symmetric)} \\ &= tr(AP)e^{\frac{1}{2}s^{\top}Ps} + (Ps)^{\top}(APs)e^{\frac{1}{2}s^{\top}Ps}|_{s=0}\quad\qquad \text{(product rule)} \\ &= tr(AP) \qquad\qquad\qquad\qquad\qquad\qquad\qquad \text{(evaluating at $s = 0$)} \\ \end{align}

2

There are 2 best solutions below

0
On BEST ANSWER

Let $B$ be a matrix. We have \begin{equation} \nabla_s^\top B s = \sum_i\frac{\partial}{\partial s_i}\sum_j b_{i j} s_j =\sum_i\sum_j b_{i j}\frac{\partial s_j}{\partial s_i} = \sum_i\sum_j b_{i j}\delta_i^j=\text{tr}(B) \end{equation}

1
On

to cut down on the symbol manipulation, I'd suggest a different path:

$\mathbb E\Big[\mathbf x^T A \mathbf x\Big]$
$=\mathbb E\Big[\text{trace}\big(\mathbf x^T A \mathbf x\big)\Big]$
$=\mathbb E\Big[\text{trace}\big( A \mathbf x\mathbf x^T\big)\Big]$
$=\text{trace}\Big(\mathbb E\big[ A \mathbf x\mathbf x^T\big]\Big)$
$=\text{trace}\Big(A\mathbb E\big[ \mathbf x\mathbf x^T\big]\Big)$
$=\text{trace}\Big(AP\Big)$