Let's say we want to count solutions to $$N=n_1^2+...+n_4^2$$ using the circle method, so we write the number of solutions as an integral $$\int _0^1S(\alpha )^4e(-\alpha N)d\alpha \hspace {15mm}S(\alpha )=\sum _{n}e(n^2\alpha ).$$ If we do this using the "classical form" of the circle method then we have a major arc approximation which ultimately relies on approximating a sum by an integral, specifically $$S(\alpha )=\sum _{r=1}^qe(ar^2/q)\sum _{n\atop {n\equiv r(q)}}e(n^2\beta )\approx \frac {1}{q}\sum _{r=1}^qe(ar^2/q)\int e(t^2\beta )dt$$ (maybe here I should write the summation and integral upper limits as $\sqrt N$).
I'm now working through Section 20.4 of Iwaniec-Kowalski, where they discuss the Kloosterman refinement. I understand that the minor arc bounds can't be strong enough in the classical set-up, so that Kloosterman's refinement involves controlling the length of the arcs (so that we have no minor arcs at all).
In that section they don't approximate the sum by an integral, but rather use Jacobi's Inversion Formula to approximate to the exponential sum (specifically Lemmas 20.10 and 20.11). If the question makes sense: how "important" is this? I thought initially it was just about controlling the precise lengths of the major arcs, which involves $a$ dependence, but is this major arc approximation also an important step? Would we fail if we tried to use the classical approximate the sum by an integral method?
If we use the traditional exponential sum and major/minor arc circle method of Hardy and Littlewood, then we will run into an issue of having an error term growing faster than the main term. This is exactly what Kloosterman was addressing in his original 1926 paper.
It should also be noted that the original circle method of Hardy and Littlewood in the 1920s was quite different from the version of Vinogradov in the 1930s, so to explain fully what Kloosterman was actually considering, we need to go over essentially everything about the historical development of the circle method.
In the Hardy-Littlewood setup, the interval $[0,1)$ is dissected using Farey fractions of order $N$. In particular, if
$$ {a'\over q'}<{a\over q}<{a''\over q''} $$
are consecutive Farey fractions of order $N$, then $N-q<q',q''\le N$ and the Farey interval $I_{a/q}$ for $\frac aq$ is defined by
$$ I_{a/q}=\left[{\frac aq}-{1\over q(q+q')},\frac aq+{1\over q(q+q'')}\right]. $$
Therefore, for any 1-periodic integrable functions,
\begin{aligned} I=\int_0^1f(\alpha)\mathrm d\alpha &=\sum_{\substack{0\le a<q\le N\\(a,q)=1}}\int_{I_{a/q}}f(\alpha)\mathrm d\beta=\sum_{\substack{0\le a<q\le N\\(a,q)=1}}\int_{-{1\over q(q+q')}}^{1\over q{(q+q'')}}f\left(\frac aq+\beta\right)\mathrm d\beta. \end{aligned}
It is often assumed that when $\omega_{a/q}$ denotes some root of unity, we have
$$ f\left(\frac aq+\beta\right)=\omega_{a/q}[g_q(\beta)+r_{a/q}(\beta)],\tag1 $$
so it is believed that $I$ should be dominated by
$$ J=\sum_{\substack{0\le a<q\le N\\(a,q)=1}}\omega_{a/q}\int_{-{1\over q(q+q')}}^{1\over q{(q+q'')}}g_q(\beta)\mathrm d\beta. $$
To continue estimating, one often deforms the path of integration into something eventually independent of $a$:
$$ J^*=\sum_{\substack{0\le a<q\le N\\(a,q)=1}}\omega_{a/q}\int_{H_q}g_q(\beta)\mathrm d\beta=\sum_{1\le q\le N}A_q B_q, $$
where
$$ A_q=\sum_{0\le a<q\\(a,q)=1}\omega_{a/q} $$
is some variant of a Kloosterman sum and $B_q$ denotes the remaining integral. If we estimate the difference $J^*-J$ directly, then we have to apply triangle inequality and put absolute value bars around $\omega_{a/q}$, resulting in some bad error bound.
To prevent this from happening, Hardy and Littlewood decided to use (1) only when $q\le N_1$ for some $N_1<N$ and bound $f(\alpha)$ directly when $q>N_1$, and the Farey segments with $q\le N_1$ form the major arcs while the segments with $N_1<q\le N$ form the minor arcs. This idea allowed them to successfully obtain an asymptotic formula for Waring' s problem.
Kloosterman, in his 1924 dissertation, applied the circle method to number $r_s(n)$ of solutions $(x_1,x_2,\dots,x_s)$ to the Diophantine equation
$$ n=a_1x_1^2+a_2x_2^2+\dots+a_sx_s^2.\tag2 $$
He managed to obtain an asymptotic formula for $r_s(n)$ but realized that the error bound grows faster than the main term when $s\le4$. As a result, he wrote another paper in 1926.
Instead of treating Farey segments separately as major arcs and minor arcs, Kloosterman introduced a preliminary transformation that makes a lot of things convenient:
$$ \int_{-{1\over q(q+q')}}^{1\over q{(q+q'')}}=\int_{-{1\over q(q+N)}}^{1\over q{(q+N)}}+\int_{-{1\over q(q+q')}}^{-{1\over q(q+N)}}+\int_{1\over q(q+N)}^{1\over q(q+q'')}, $$
so we have
$$ J^*-J=R_1-R_2-R_3, $$
in which
$$ R_1=\sum_{1\le q\le N}A_q\left(\int_{H_q}-\int_{1\over q(q+N)}^{1\over q(q+N)}\right)g_q(\beta)\mathrm d\beta, $$
$$ R_2=\sum_{\substack{0\le a<q\le N\\(a,q)=1}}\omega_{a/q}\int_{-{1\over q(q+q')}}^{-{1\over q(q+N)}}g_q(\beta)\mathrm d\beta, $$
$$ R_3=\sum_{\substack{0\le a<q\le N\\(a,q)=1}}\omega_{a/q}\int_{1\over q(q+N)}^{1\over q(q+q'')}g_q(\beta)\mathrm d\beta. $$
Trivially, $|A_q|\le q$, but by studying $A_q$ in detail, Kloosterman successfully obtained some improvement on the exponent of $q$, so he was able to obtain some good bounds for $R_1$.
Since the estimation procedures for $R_2$ and $R_3$ are similar, we only present the technical details for $R_3$:
\begin{aligned} R_3 &=\sum_{\substack{0\le a<q\le N\\(a,q)=1}}\omega_{a/q}\sum_{q+q''\le l<q+N}\int_{1\over q(l+1)}^{1\over ql}g_q(\beta)\mathrm d\beta \\ &=\sum_{1\le q\le N}\sum_{N<l<q+N}A_{q,l}\int_{1\over q(l+1)}^{1\over ql}g_q(\beta)\mathrm d\beta, \end{aligned}
in which
$$ A_{q,l}=\sum_{\substack{0<a\le q\\(a,q)=1\\q+q''\le l}}\omega_{a/q},\quad(N-q<q''\le N,aq''\equiv-1\pmod q) $$
is some incomplete Kloosterman sum that also possesses some bounds better than $|A_{q,l}|\le q$. In fact, the error $J-I$ can also be handled using a similar decomposition.