Theoretical proof of convergence of sequential weight update procedure (Neural Networks and Machine Learning)

583 Views Asked by Bumbble Comm At 28 Mar 2026 - 12:51

My question is at the bottom. (Most of the descriptive words come from Chris. Bishop's Neural Networks for Pattern Recognition)

Let $w$ be the weight vector of the neural network and $E$ the error function.

According to the Robbins-Monro algorithm, this sequence: $$w_{kj}^{(r+1)}=w_{kj}^{(r)}-\eta\left.\frac{\partial E}{\partial w_{kj}}\right|_{w^{(r)}}$$ will converge to a limit, for which: $$\frac{\partial E}{\partial w_{kj}}=0.$$

In general the error function is given by a sum of terms, each of which is calculated using one of the patterns from the training set, so that $$E=\sum_nE^n(w)$$ And in applications we just update the weight vector using one pattern at a time $$w_{kj}^{(r+1)}=w_{kj}^{(r)}-\eta\frac{\partial E^n}{\partial w_{kj}}$$

My question is: Why will the algorithm converge using the last formula? Once we use it to update the $w$, the value of $w$ is changed, and I can't prove the convergence using $$\frac{\partial E}{\partial w_{kj}}=\sum_n \frac{\partial E^n}{\partial w_{kj}}$$

Original Q&A

Theoretical proof of convergence of sequential weight update procedure (Neural Networks and Machine Learning)

Related Questions in ALGORITHMS

Related Questions in PROOF-WRITING

Related Questions in MACHINE-LEARNING

Related Questions in PATTERN-RECOGNITION

Related Questions in NEURAL-NETWORKS

Trending Questions

Popular # Hahtags

Popular Questions