Approximation of the gradient of a log-likelihood function implies the approximation of the log-likelihood function.

Question

Approximation of the gradient of a log-likelihood function implies the approximation of the log-likelihood function.

345 Views Asked by Bumbble Comm At 25 Mar 2026 - 10:25

Suppose that I have the log-likelihood (scalar) function $\ln(p(\mathbf{v};\mathbf{\theta}))$, where $\mathbf{v}$ is some random vector and $\mathbf{\theta}$ is a vector of parameters, and suppose it is given that:

$$ \frac{\partial \ln(p(\mathbf{v};\mathbf{\theta}))}{\partial \mathbf{\theta}} = -\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[\frac{\partial E(\mathbf{v},\mathbf{h};\mathbf{\theta})}{\partial \mathbf{\theta}} \right] + \mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[\frac{\partial E(\mathbf{v},\mathbf{h};\mathbf{\theta})}{\partial \mathbf{\theta}} \right] $$

Where:

$\mathbf{h}$ is another random vector.
$E(\mathbf{v},\mathbf{h};\theta)$ is some scalar function of $\mathbf{v}$, $\mathbf{h}$, and $\theta$.
$\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}$ is an expectation with respect to the conditional probability distribution $p(\mathbf{h}|\mathbf{v})$.
$\mathbb{E}_{p(\mathbf{v},\mathbf{h})}$ is an expectation with respect to the conditional probability distribution $p(\mathbf{v},\mathbf{h})$.

The expectations can be approximated using Monte Carlo integration such that:

$$ \frac{\partial \ln(p(\mathbf{v};\mathbf{\theta}))}{\partial \mathbf{\theta}} \approx - \frac{1}{N} \sum_{i = 1}^{N} \left[\frac{\partial E(\mathbf{v},\mathbf{h}_i;\mathbf{\theta})}{\partial \mathbf{\theta}} \right] + \frac{1}{M} \sum_{j=1}^{M} \left[\frac{\partial E(\mathbf{v}_j,\mathbf{h}_j;\mathbf{\theta})}{\partial \mathbf{\theta}} \right] $$

Where:

$\mathbf{h}_i$ is sampled from $p(\mathbf{h}|\mathbf{v})$.
$\mathbf{v}_j$ and $\mathbf{h}_j$ are sampled from $p(\mathbf{v},\mathbf{h})$.

Since the gradient is a linear operator, then the above approximation can be re-arranged such that:

$$ \frac{\partial \ln(p(\mathbf{v};\mathbf{\theta}))}{\partial \mathbf{\theta}} \approx \frac{\partial}{\partial \mathbf{\theta}} \left(\frac{1}{M} \sum_{j=1}^{M} \left[E(\mathbf{v}_j,\mathbf{h}_j;\mathbf{\theta}) \right] - \frac{1}{N} \sum_{i = 1}^{N} \left[E(\mathbf{v},\mathbf{h}_i;\mathbf{\theta}) \right] \right) $$

Does this approximation then imply that:

$$ \ln(p(\mathbf{v};\theta)) \approx \frac{1}{M} \sum_{j=1}^{M} \left[E(\mathbf{v}_j,\mathbf{h}_j;\mathbf{\theta}) \right] - \frac{1}{N} \sum_{i = 1}^{N} \left[E(\mathbf{v},\mathbf{h}_i;\mathbf{\theta}) \right] $$

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Let's assume some regularity conditions such that we can exchange the partial derivative $\partial/\partial\theta$ with the expectations in the first equation. This allows us to perform a line integral and find:

$$\begin{align} \ln p(\mathbf v;\theta)&=\ln p(\mathbf v;\theta_0)+\int_{\theta_0}^\theta\frac{\partial \ln p(\mathbf{v};\mathbf{\theta})}{\partial \mathbf{\theta}}\cdot d\theta\\ &=\ln p(\mathbf v;\theta_0)+\int_{\theta_0}^\theta\left\{-\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[\frac{\partial E(\mathbf{v},\mathbf{h};\mathbf{\theta})}{\partial \mathbf{\theta}} \right] + \mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[\frac{\partial E(\mathbf{v},\mathbf{h};\mathbf{\theta})}{\partial \mathbf{\theta}} \right]\right\}\cdot d\theta\\ &=\ln p(\mathbf v;\theta_0)+\int_{\theta_0}^\theta\left\{-\frac{\partial}{\partial\theta}\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[E(\mathbf v,h;\theta)\right] + \frac{\partial}{\partial\theta}\mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[E(\mathbf{v},\mathbf{h};\mathbf{\theta}) \right]\right\}\cdot d\theta\\ &=-\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[E(\mathbf v,h;\theta)\right] + \mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[E(\mathbf{v},\mathbf{h};\mathbf{\theta}) \right]+C\\ \end{align}$$

Where $\theta_0$ is an arbitrary vector in the parameter space. $C$ is a constant given by $C=\ln p(v;\theta_0)-\mathbb{E}_{p(\mathbf{h}|\mathbf{v})}\left[E(\mathbf v,h;\theta_0)\right] + \mathbb{E}_{p(\mathbf{v},\mathbf{h})}\left[E(\mathbf{v},\mathbf{h};\mathbf{\theta_0}) \right]$.

Now, your Monte Carlo estimator converge to their expected value with an errors $O_p\left(\frac{1}{\sqrt M}\right)$ and $O_p\left(\frac{1}{\sqrt N}\right)$, where $O_p$ denotes stochastic boundness. If we define $n=\min(M,N)$, the error associated to the sum of these estimators is bounded by $O_p\left(\frac{1}{\sqrt n}\right)$.

Summing up, using these Monte Carlo estimators you described will give you a good approximation, up to the additive constant $C$. If your objective is to find maximum likelihood estimators or posterior distributions, that constant might not be relevant, as it would cancel out in later calculations. But it might matter depending on what exactly you are trying to do.

Hope it was helpful!

Approximation of the gradient of a log-likelihood function implies the approximation of the log-likelihood function.

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in PROBABILITY-THEORY

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in LOG-LIKELIHOOD

Trending Questions

Popular # Hahtags

Popular Questions