Proving Convergence using Taylor Series

Question

Proving Convergence using Taylor Series

66 Views Asked by Bumbble Comm At 30 Apr 2026 - 7:56

I was reading the following Wikipedia page https://en.wikipedia.org/wiki/Scoring_algorithm under the "Sketch of Derivation" section:

Let $Y_1,\ldots,Y_n$ be random variables, independent and identically distributed with twice differentiable p.d.f. $f(y; \theta)$, and we wish to calculate the maximum likelihood estimator (M.L.E.) $\theta^*$ of $\theta$. First, suppose we have a starting point for our algorithm $\theta_0$, and consider a Taylor expansion of the score function, $V(\theta)$, about $\theta_0$:

$$V(\theta) \approx V(\theta_0) - \mathcal{J}(\theta_0)(\theta - \theta_0)$$

where

$$\mathcal{J}(\theta_0) = -\sum_{i=1}^n \left. \nabla \nabla^\top \right|_{\theta=\theta_0} \log f(Y_i; \theta)$$

is the observed information matrix at $\theta_0$.

Now, setting $\theta$ = $\theta*$ , using that $V(\theta^*) = 0$ and rearranging gives us:

$$\theta^* \approx \theta_{0} + \mathcal{J}^{-1}(\theta_{0})V(\theta_{0}).$$

We therefore use the algorithm

$$\theta_{m+1} = \theta_{m} + \mathcal{J}^{-1}(\theta_{m})V(\theta_{m})$$

and under certain regularity conditions, it can be shown that $\theta_m \rightarrow \theta^*$.

My Question: I am trying to learn about the "regularity conditions" as well as the general proof required to prove the following statement:

And under certain regularity conditions, it can be shown that $\theta_m \rightarrow \theta*$.

I tried consulting different sources but I could not understand/find more information about these regularity conditions and a general mathematical proof that shows this convergence property.

Can someone please help me understand this?

Thanks!

Note: I understand that $\theta* \rightarrow \theta$ , i.e. the classical consistency property of MLE - the MLE estimator converges to the true value as the sample size approaches infinity ....but I am trying to understand why $\theta_m$ (a numerical approximation of $\theta*$) approaches $\theta_*$ as the number of iterations $m$ becomes large.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2023-11-10 08:37:51

The pattern behind that is Newton's method (for optimization). You start with $$ \phi(\theta)=\sum_{i=1}^n\ln f(Y_i;\theta)\approx \phi(\theta_0)+\phi'(\theta_0)(\theta-\theta_0)+\frac12\phi''(\theta_0)[\theta-\theta_0,\theta-\theta_0]+O(\|\theta-\theta_0\|^3) $$ Being able to write a big O there is already one regularity condition.

At an extremum of $\phi$ its derivative or gradient is zero $$ 0=V(\theta_*)=\nabla\phi(\theta_*)=\phi'(\theta_*)^\top,\\ V(\theta)=V(\theta_0)+V'(\theta_0)(\theta-\theta_0)+O(\|\theta-\theta_0\|^2) $$ To extract the derivative of $V$, the Hessian matrix $J=-V'$ of $\phi$, from the bilinear form of the second derivative of $\phi$ requires some linear algebra bricolage, transposition in one argument. Among other things one would have to recall what $\nabla$ stands for. Usually it is the gradient operator, $\nabla \phi(\theta)=\phi'(\theta)^\top$. This seems also to be the case here, so that $\nabla\nabla^\top$ stands for the matrix of all (mixed) second derivative operators.

Thus solving $$ 0=V(\theta_0)-J(\theta_0)(\theta-\theta_0) $$ is the same as solving $$ 0=V(\theta_0)+V'(\theta_0)(\theta-\theta_0), $$ which defines the Newton step. One level back it is the same as finding the extremal point of the quadratic function $$ \phi(\theta_0)+\phi'(\theta_0)(\theta-\theta_0)+\frac12\phi''(\theta_0))[θ−θ_0,θ−θ_0] $$ Thus you get also the convergence conditions of Newton's method as part of the regularity conditions. These may be expressed in $\phi$ or $f$, and thus perhaps simplify somewhat.

Proving Convergence using Taylor Series

There are 1 best solutions below

Related Questions in PROBABILITY

Related Questions in CONVERGENCE-DIVERGENCE

Related Questions in NUMERICAL-METHODS

Related Questions in PROOF-EXPLANATION

Related Questions in TAYLOR-EXPANSION

Trending Questions

Popular # Hahtags

Popular Questions