In the article
- Sergios Karagiannakos, The theory behind Latent Variable Models: formulating a Variational Autoencoder, 2021.
to model, the desired probability distribution, estimating the parameters of a probability distribution so that the distribution fits the observed data is presented as an optimization problem of:
$$ \theta^{\text{ML}} = \arg\max_{\theta} \sum_{i=1}^N \log p_{\theta} (x_i) $$
The gradient of the marginal log-likelihood function is then calculated using simple calculus and the Bayes rule:
$$ \nabla \log p_{\theta} (x) = \int p_{\theta} (z | x) \nabla_{\theta} \log p_{\theta} (x, z) \, {\rm d} z$$
where can one find the proof/maths behind this gradient calculation?