How does $\frac{\nabla_\theta p(\theta\mid x)}{p(\theta\mid x)}$ become $\nabla_\theta \log(p(\theta \mid x))$ in this proof?

72 Views Asked by At

I'm trying to understand this proof of the gradient of the expected value of a certain function. Here is the full text of the proof:

Let $f(x)$ be any function, and $p(\theta \mid x)$ be any parameterized distribution over $x$ which is differentiable with respect to $\theta$. The gradient of the expectation of $f(x)$ can be derived as:

$$ \nabla_\theta \mathbb{E}_{x\sim p(\theta \mid x)}[f(x)] = \nabla_\theta \int_\Omega f(x)p(\theta \mid x)dx \quad \text{(makes sense. Just the definition of E)} $$ $$ = \int_\Omega f(x)\frac{\nabla_\theta p(\theta\mid x)}{p(\theta\mid x)}p(\theta\mid x) dx \quad \text{(Push in the gradient, and mult by 1=p(.)/p(.))}$$ $$ =\int_\Omega f(x)\nabla_\theta (\log p(\theta\mid x))p(\theta \mid x)dx \quad \text{(What??)}$$

(Here $\Omega$ just represents the domain of x)

The main paper says this proof is somehow derived from policy gradients, but I think that's for the later lines of the proof. I'm totally stumped by where that $\log$ comes from. Is this one of those simple 10th-grade algebra identities I just forgot? My stats knowledge is admittedly rusty, but I just don't understand how they got from line 2 to line 3.

1

There are 1 best solutions below

2
On BEST ANSWER

This is a generalization of the fact that, by the chain rule, $$\frac{d}{dx}\log(f(x)) = \frac{f'(x)}{f(x)}$$ where here $\log(x)$ is the natural logarithm.