Derivation of the equivalence between Stein's Method and Minimizing the KL-Divergence of a perturbed distribution (Calculus of Variations)

79 Views Asked by At

I was going through Steins Method which establishes that two distributions p and q are equal if and only if

$$\mathbf{E}_{x \sim q}[\tau_{p} \phi(x)] = 0$$

The Stein Operator $\tau_{p} \phi(x)$ is defined as follows in the slide Stein Operator

The slides further state the following which I am unable to derive.

Let $x \sim q$ and $q_{[\epsilon \phi]}$ be the density of $x^{'} = x + \epsilon \phi(x)$, then

$$\frac{\partial}{\partial \epsilon} KL(q_{[\epsilon \phi]} \vert\vert p) \vert_{\epsilon=0} = -\mathbf{E}_{x \sim q}[\tau_{p} \phi(x)]$$

This is also expressed in the attached slide KL-Stein

How do I prove this? What does the statement "$q_{[\epsilon \phi]}$ be the density of $x^{'} = x + \epsilon \phi(x)$" mean parametrically. The formulation screams Calculus of Variations to me, but without understanding the above statement, I am unable to derive the variational objective. Could someone throw some light on this derivation please?