The author says that x is sampled from a distribution. Then arrives at the gradient of a distribution. How can I take the gradient of the distribution here?
Also, since the author says is a very simple derivation, plz explain the steps, how did gradient get inside the integral and log came?
Also, is the final result a vector?
Thanks for your help. BTW it is a derivation in Machine learning where I get beaten up maths now and then.
"How can I take the gradient of the distribution here?" $E_{x\sim p(x|\theta)}[f(x)]$ means the expectation is taken with respect to a random variable $X$ that is distributed as $p(x|\theta)$. $p(x|\theta)$ is a family of distributions indexed by $\theta$. So $E_{x\sim p(x|\theta)}[f(x)]$ is a function of $\theta$ and you can try to take its derivative/gradient. You aren't taking the "gradient of the distribution".
"how did gradient get inside the integral" The derivation is assuming this interchange is legitimate, which it is in many commonly encountered cases. See wikipedia on differentiation under the integral sign for more details.
..."and log came" It's just basic calculus, $\nabla log(f(x))=\nabla f(x)/f(x)$.
"Also, is the final result a vector?" It will have the same dimension as $\theta$, like any other gradient.