I've noticed that some papers, e.g. in theoretical computer science and numerical mathematics, provide pseudo-algorithms for their proposed methods. Often these pseudo-algorithms have instructions like:
While not converged do:
....
end
or
Repeat:
....
Until convergence
This makes me wonder how convergence is defined in mathematical terms and how one is testing if something has converged, i.e. what are the criteria for convergence?
One common criteria I'm familiar with is the euclidean norm of two succeeding intermediate solutions being smaller than some $\epsilon$, $\left\lVert \theta_{t-1} - \theta_t \right\rVert_2 < \epsilon$. But are there other measures? What are the trade-offs of different measures? What if the solution we like to obtain doesn't live in a vector space in the euclidean sense? Think of a scenario where we like to estimate probability densities, could we use the Kullback-Leibler-Divergence $D_{KL}$ of two succeeding iterations in this case?
In mathematical terms, to say that a sequence has converged means that it has reached the limit, which in most cases simply never happens. What we mean when we say that a method has converged, is that for our purposes, the sequence has come close enough to the limit. But what close enough means, strongly depends on the situation and is very subjective.
Some examples that you haven't mentioned yet:
If you are minimizing something, a stopping criterium could be that the norm of the gradient is smaller than some $\varepsilon$.
In the case of solving an equation, here you could stop once the norm of the difference between both sides (i.e. residual norm) is smaller than some $\varepsilon$.
when you run out of patience. This one is very bad but also very popular.
In practice, finding a good stopping criterium can be very hard. In pseudo code it is easy to sweep this under the rug and pretend that you have a complete algorithm that nicely spits out what you want. In practice, this is something that you might have to tune, and sometimes it is simply dictated by how much time you have.
What if the solution we like to obtain doesn't live in a vector space in the euclidean sense?
This happens quite often. Notice that in the examples about I spoke about norms. These do not have to be the euclidean norm. You can replace it by other norms, metrics, distance/similarity functions that suite your case. In the case of probabilty density functions, you could use the $L_1$ norm for example. Using Kullback-Leibler divergence between succesive terms does not look like a very natural idea to me.