When a nonlinear differential equation cannot be solved exactly, an approximate way is to consider the unknown function as a combination of its equilibrium value plus a small deviation from equilibrium value and then putting it back into the equation and neglecting the higher order terms. In many textbooks of physics this method is used. What is the underlying theory for this method ?
Suppose, I have a non-linear differential equation in an unknown variable Y. If the equation cannot be solved exactly , i have seen that people consider Y=Y0 + Y1 and put it back into the equation. They then neglect the terms like Y0*Y0 , Y1*Y1, Y1*grad(Y1) etc. as they consider them to be HIGHER ORDER. I understand that they are small quantities and hence can be neglected. But for me this whole scheme appears to be coming from nowhere. Is there any rigorous mathematical theory behind this ?
This is only the first step in studying a non-linear equation: finding its stationary points and studying their stability. Therefore using small deviations is justified, as we are interested here in the behavior near these stationary points.
Since I was asked to suggest the books, I will give here a few recommendation. As I mentioned in the comments, I read them in Russian, so I cannot guarantee the availability and/or the quality of their English translations. I am also pretty sure that there are excellent and more up-to-date books available.