I'm trying to understand this proof of the gradient of the expected value of a certain function. Here is the full text of the proof:
Let $f(x)$ be any function, and $p(\theta \mid x)$ be any parameterized distribution over $x$ which is differentiable with respect to $\theta$. The gradient of the expectation of $f(x)$ can be derived as:
$$ \nabla_\theta \mathbb{E}_{x\sim p(\theta \mid x)}[f(x)] = \nabla_\theta \int_\Omega f(x)p(\theta \mid x)dx \quad \text{(makes sense. Just the definition of E)} $$ $$ = \int_\Omega f(x)\frac{\nabla_\theta p(\theta\mid x)}{p(\theta\mid x)}p(\theta\mid x) dx \quad \text{(Push in the gradient, and mult by 1=p(.)/p(.))}$$ $$ =\int_\Omega f(x)\nabla_\theta (\log p(\theta\mid x))p(\theta \mid x)dx \quad \text{(What??)}$$
(Here $\Omega$ just represents the domain of x)
The main paper says this proof is somehow derived from policy gradients, but I think that's for the later lines of the proof. I'm totally stumped by where that $\log$ comes from. Is this one of those simple 10th-grade algebra identities I just forgot? My stats knowledge is admittedly rusty, but I just don't understand how they got from line 2 to line 3.
This is a generalization of the fact that, by the chain rule, $$\frac{d}{dx}\log(f(x)) = \frac{f'(x)}{f(x)}$$ where here $\log(x)$ is the natural logarithm.