Suppose that I have the log-likelihood (scalar) function $\ln(p(\mathbf{v};\mathbf{\theta}))$, where $\mathbf{v}$ is some random vector and $\mathbf{\theta}$ is a vector of parameters, and suppose it is given that:
$$ \frac{\partial \ln(p(\mathbf{v};\mathbf{\theta}))}{\partial \mathbf{\theta}} = -\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[\frac{\partial E(\mathbf{v},\mathbf{h};\mathbf{\theta})}{\partial \mathbf{\theta}} \right] + \mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[\frac{\partial E(\mathbf{v},\mathbf{h};\mathbf{\theta})}{\partial \mathbf{\theta}} \right] $$
Where:
- $\mathbf{h}$ is another random vector.
- $E(\mathbf{v},\mathbf{h};\theta)$ is some scalar function of $\mathbf{v}$, $\mathbf{h}$, and $\theta$.
- $\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}$ is an expectation with respect to the conditional probability distribution $p(\mathbf{h}|\mathbf{v})$.
- $\mathbb{E}_{p(\mathbf{v},\mathbf{h})}$ is an expectation with respect to the conditional probability distribution $p(\mathbf{v},\mathbf{h})$.
The expectations can be approximated using Monte Carlo integration such that:
$$ \frac{\partial \ln(p(\mathbf{v};\mathbf{\theta}))}{\partial \mathbf{\theta}} \approx - \frac{1}{N} \sum_{i = 1}^{N} \left[\frac{\partial E(\mathbf{v},\mathbf{h}_i;\mathbf{\theta})}{\partial \mathbf{\theta}} \right] + \frac{1}{M} \sum_{j=1}^{M} \left[\frac{\partial E(\mathbf{v}_j,\mathbf{h}_j;\mathbf{\theta})}{\partial \mathbf{\theta}} \right] $$
Where:
- $\mathbf{h}_i$ is sampled from $p(\mathbf{h}|\mathbf{v})$.
- $\mathbf{v}_j$ and $\mathbf{h}_j$ are sampled from $p(\mathbf{v},\mathbf{h})$.
Since the gradient is a linear operator, then the above approximation can be re-arranged such that:
$$ \frac{\partial \ln(p(\mathbf{v};\mathbf{\theta}))}{\partial \mathbf{\theta}} \approx \frac{\partial}{\partial \mathbf{\theta}} \left(\frac{1}{M} \sum_{j=1}^{M} \left[E(\mathbf{v}_j,\mathbf{h}_j;\mathbf{\theta}) \right] - \frac{1}{N} \sum_{i = 1}^{N} \left[E(\mathbf{v},\mathbf{h}_i;\mathbf{\theta}) \right] \right) $$
Does this approximation then imply that:
$$ \ln(p(\mathbf{v};\theta)) \approx \frac{1}{M} \sum_{j=1}^{M} \left[E(\mathbf{v}_j,\mathbf{h}_j;\mathbf{\theta}) \right] - \frac{1}{N} \sum_{i = 1}^{N} \left[E(\mathbf{v},\mathbf{h}_i;\mathbf{\theta}) \right] $$
Let's assume some regularity conditions such that we can exchange the partial derivative $\partial/\partial\theta$ with the expectations in the first equation. This allows us to perform a line integral and find:
$$\begin{align} \ln p(\mathbf v;\theta)&=\ln p(\mathbf v;\theta_0)+\int_{\theta_0}^\theta\frac{\partial \ln p(\mathbf{v};\mathbf{\theta})}{\partial \mathbf{\theta}}\cdot d\theta\\ &=\ln p(\mathbf v;\theta_0)+\int_{\theta_0}^\theta\left\{-\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[\frac{\partial E(\mathbf{v},\mathbf{h};\mathbf{\theta})}{\partial \mathbf{\theta}} \right] + \mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[\frac{\partial E(\mathbf{v},\mathbf{h};\mathbf{\theta})}{\partial \mathbf{\theta}} \right]\right\}\cdot d\theta\\ &=\ln p(\mathbf v;\theta_0)+\int_{\theta_0}^\theta\left\{-\frac{\partial}{\partial\theta}\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[E(\mathbf v,h;\theta)\right] + \frac{\partial}{\partial\theta}\mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[E(\mathbf{v},\mathbf{h};\mathbf{\theta}) \right]\right\}\cdot d\theta\\ &=-\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[E(\mathbf v,h;\theta)\right] + \mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[E(\mathbf{v},\mathbf{h};\mathbf{\theta}) \right]+C\\ \end{align}$$
Where $\theta_0$ is an arbitrary vector in the parameter space. $C$ is a constant given by $C=\ln p(v;\theta_0)-\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[E(\mathbf v,h;\theta_0)\right] + \mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[E(\mathbf{v},\mathbf{h};\mathbf{\theta_0}) \right]$.
Now, your Monte Carlo estimator converge to their expected value with an errors $O_p\left(\frac{1}{\sqrt M}\right)$ and $O_p\left(\frac{1}{\sqrt N}\right)$, where $O_p$ denotes stochastic boundness. If we define $n=\min(M,N)$, the error associated to the sum of these estimators is bounded by $O_p\left(\frac{1}{\sqrt n}\right)$.
Summing up, using these Monte Carlo estimators you described will give you a good approximation, up to the additive constant $C$. If your objective is to find maximum likelihood estimators or posterior distributions, that constant might not be relevant, as it would cancel out in later calculations. But it might matter depending on what exactly you are trying to do.
Hope it was helpful!