I read in many physics books about Green's function that Fourier transform of $$\exp(-i\epsilon t)\Theta(t)$$ is $$\frac{i}{\omega-\epsilon+i\delta},$$ where $\Theta$ is step function and $\delta$ is an infinitesimal complex shift. I always imagine that since Fourier transform is not convergent in usual sense, the $i\delta$ is added to $\omega$ to ensure convergence. I don't view it as strict Fourier transform by definition.
However, this Fourier transform is becoming more and more frequent in literature I am currently reading. Quite a lot of them are applying convolution theorem and some other identities that is only proved for real Fourier transform and I become quite nervous about this.
Is the literature I am reading abusing Fourier transform or is there a broader sense of convergence of Fourier transform? I guess this has something to do with generalized function but I am not familiar with it.
Perhaps your physics books are abusing the Fourier transform, because they're physics books, but this stuff can be made rigorous. You are correct to suspect generalized functions. I only know the material at an introductory level myself, but I can give you a very brief and sketchy overview which will hopefully show you it's approachable: take some class of well behaved (for example, smooth and compactly supported) functions, called test functions, which you wish to use, give them the structure of a topological vector space, and consider the continuous linear functionals of the vector space. These continuous linear functionals are called distributions or generalized functions.
The reason these generalize functions is that you can take a test function $f$ and obtain a linear functional $\phi \mapsto \int f(x)\phi(x)$. Test functions therefore sit inside the space of distributions. Distributions turn out to have lots of nice properties the test functions have; for example they can be differentiated.
If you take 'test functions' to include more functions, the corresponding space of distributions will be more and more tightly structured as continuity becomes harder to obtain. There is a choice of test functions called the Schwartz space for which the space of distributions is particularly nice, and for which not only is there a well defined Fourier transform (it consists of precomposing with the regular Fourier transform), it's actually an automorphism. Because the distribution Fourier transform is, modulo rigor, the regular Fourier transform, results about Fourier transforms generalize nicely to the distributional setting.