Wikipedia's entry on Isserlis' theorem has the following identity:
$$ \mathbb{E}\Big[ X_1 f(X_1, ..., X_n) \Big] = \sum_{i=1}^n \mathbb{E}[X_1 X_i] \mathbb{E}\Big[ \frac{\partial}{\partial X_i} f(X_1, ..., X_n) \Big],$$
where $X_1, ..., X_n $ is a zero-mean multivariate Gaussian random vector. My questions:
- What is this theorem called? Is this some incarnation of Wick's theorem? (I do not understand the notation on this page at all).
- What are the conditions required on $f$ for this identity to hold?
Happily, I found the answer to 2. myself by proving the result.
Proof:
First, we prove the statement when $X_1, ..., X_n$ are independent with standard deviations $1$. Stein's lemma says
$$ \mathbb{E}[X_1 g( X_1)] = \mathbb{E}[g'( X_1)],$$
for all $g$ such that $\mathbb{E}| X_1 g( X_1)| < \infty $ and $\mathbb{E}| g'( X_1)| < \infty $. Now for all $i$
\begin{align} \mathbb{E}[X_i f(X_1, ..., X_n)] &= \mathbb{E} \Big[ \mathbb{E}\big[ X_i f(X_1, ..., X_n) \mid X_2, ...X_n\big] \Big] \\ & = \mathbb{E} \Big[ (\partial/\partial X_i) f(X_1, ..., X_n) \Big], \end{align} or in vector notation, $$ \text{Cov}[\mathbf{X}, f(\mathbf{X}) ] = \mathbb{E}[\nabla f(\mathbf{X}) ].$$
Now apply an affine transform to $\mathbf{X}$, $\mathbf{Z} = \Sigma^{(1/2)}\mathbf{X} + \boldsymbol{\mu}$. Also "absorb" the affine transform into $f$, $f(\mathbf{X})=h(\Sigma^{(1/2)} \mathbf{X} + \boldsymbol{\mu})$ for some $h$. We have
\begin{align} \text{Cov} [\mathbf{Z}, h(\mathbf{Z}) ] &= \text{Cov} [\Sigma^{(1/2)}\mathbf{X}+ \boldsymbol{\mu}, f(\mathbf{X})]\\ &=\text{Cov} [\Sigma^{(1/2)}\mathbf{X}, f(\mathbf{X})] \\ &=\Sigma^{(1/2)}\text{Cov} [\mathbf{X}, f(\mathbf{X})] \\ &=\Sigma^{(1/2)}\mathbb{E}[\nabla f(\mathbf{X})] \\ &=\Sigma\,\mathbb{E}[\nabla h(\mathbf{Z})]. \end{align}
In particular, we may extract the first entry of the vector, yelding
$$\text{Cov} [Z_1 h(\mathbf{Z})] = \sum_{i=1}^n \mathbb{E}[Z_i Z_1] \mathbb{E} [ (\partial/\partial Z_i) h(\mathbf{Z})].$$
Answer to 2. In particular, in applying Stein's lemma we used that $h$ and its first partial derivatives have bounded absolute first moment.