I have a general concern about convergence/asymptotic methods in analysis/topology that has been in the back my mind for several years since I first studied analysis. Please note I am not trying to criticize anyone, I'm just trying to understand how asymptotic theory is applied. My concern is about using the limit of a sequence of objects as an approximation to the objects that are ''far'' in the sequence.
Let's consider a metric space situation with a sequence converging to a limit: $(a_{n})_{n=1}^{\infty}\rightarrow a$. Sometimes it seems like people think that this justifies us in saying (things like) that $a$ is a good approximation to $a_{10,000}$. But of course this does not quite follow from the definition of convergence. Indeed, $|a_{10,000}-a|$ can be arbitrarily large without violating the assumption of convergence. For the application in question (whatever it may be) it is conceivable that no remotely useful bounds can be placed on $a_{n}$ until (say) $n\geq10^{100}$.
It seems to me that in order for us to justifiably replace a quantity by its "approximation" in a given situation, we must determine the radius of error that is tolerable for the situation and then determine the value of $n$ (or the value of $\delta$ or whatever it may be) that guarantees that one will be within the given radius. Oftentimes (always? see below) the requisite value of $n$ (or equivalent) can be found simply by looking back into the proof of convergence. The convergence proof will have shown, for every tolerance level, the existence of an $n$ beyond which $|a_{n}-a|$ is below that tolerance. But I feel that sometimes in practice people don't go through this justification process but rather just settle for saying that: the $n$ is large so the approximation must be good.
Here are a few examples where I have felt like authors/professors replace quantities by "approximations" without seeming to justify this practice in the way that I have described above. It seems to be common in probability books (even rigorous ones like Durrett), when talking about something like the central limit theorem (or law of large numbers) to say things like: if you spin the roulette wheel a thousand times then (by the central limit theorem) we can approximate the distribution by the normal distribution and so the probability that it lands on red less than five times is approximately blank. As another example, people sometimes say that we can replace $ln(1+x)$ with $x$ for $x$ near zero. I understand that the convergence is $o(|x|)$, but the same issue applies -- the quantifier is still ''there exists a $\delta$ ''. In order to justify the approximation, don't we need to know how big that $\delta$ is for the desired error tolerance? Another similar example: When I took a class on macroeconomics a while back, they were frequently replacing nonlinear difference equations with their linear approximations using as justification the properties of the linear approximation without saying how much error we can tolerate for the application and without saying anything about the size of the neighborhood within which the error tolerance is met. Are these sorts of things problematic? (Maybe in these cases people have used numerical/empirical methods (e.g. Monte Carlo simulation) to get a sense of how big $n$ needs to be for a given error tolerance?)
Here's a closely related issue (which I hinted at above). In theory, one can find out from the proof of convergence how big $n$ (or equivalent) must be for a given tolerance. However, sometimes proofs of convergence of one thing rely on convergence of another thing and so on, sometimes for several steps. In such cases, it seems that the requisite values of $n$ get sort of lost in translation. In principle, they could be recovered by going back through the sequence of different proofs and piecing together the quantifiers and bounds, but this might be kind of difficult or at least tedious. (Perhaps there are also cases where convergence is established by proof by contradiction or something and so bounds cannot be "read off the proof". I think the term "nonconstructive proof" might be relevant here.)
Perhaps an example where the requisite values of $n$ "get lost in translation" is the proof of the central limit theorem that uses characteristic functions. That proof shows convergence of the characteristic functions of the distributions in question to the characteristic function of the normal distribution and then invokes the fact (called the continuity theorem) that if the characteristic functions converge, then so do the distribution functions. I'm guessing it would be difficult to recover error bounds from this proof. How is this problem of bounds "getting lost in translation" addressed?
(Perhaps the central limit theorem is just for use as a general guiding principle whereas in specific applications one would want to directly prove the convergence of the sequence of distributions in question to the normal distribution? In other words, perhaps one could attain information about the speed of convergence more easily (not to mention sharper bounds) by looking at direct proofs of the convergence of the given sequence of distributions to the normal distribution rather than by looking at the proof of the central limit theorem? I'm not sure this would totally solve the issue though--the direct proof that I've seen of the fact that the binomial distribution converges to the normal distribution relies on Stirling's formula which is another convergence result so that might make it more difficult to read off the bounds from the proof.) (Edit: The Berry-Esseen Theorem does give a bound on accuracy of the normal approximation in finite sample.)
Thanks!
A couple points about your question.
Often, in the preface of an actual proof you will find the general strategy of the proof. Without having the mentioned books at hand some passages sound like they might be just that.
You are absolutely right that for any fixed $n$, $d(a_na)=\varepsilon$ can be arbitrarily large, however usually that's not the strategy. Usually you fix the $\varepsilon$ and know of the existence of some $N$.
"Replacing" something by its limit can generally speaking only be done with a remainder term. This can either be an $\epsilon$ or using $o(x)$-notation or something else. It is of vital importance to know exactly what you can and can not do with those "correction terms".
You should distinguish between cases were you are actually performing the limit at some point and those were you are also interested in what happens around the limiting point. In the first case actual bounds on the quality of the approximation are generally speaking of less importance. In the latter case it is actally common to strive for a better understanding of the approximation error. An example would be the various remainder terms for the Taylor approximation.