Theorem: (change of variable) suppose $\phi$ is a strictly increasing continuous function that maps in interval [A,B] onto [a,b]. Suppose $\alpha$ is monotonically increasing on [a,b] and f is integrable with respect to $\alpha$ on [a,b]. Define $\beta$ and g on [A,B] by $\beta(y=\alpha ( \phi(y))$ , $g(y)=f( \phi(y))$. then g is integrable with respect to $\beta$ and $\int_A^B g d \beta= \int_a^b f d \alpha$.
proof: to each partition $P={x_0,...,x_n}$ of $[a,b]$ corresponds a partition $Q={y_0,...,y_n}$ of [A,B], so that $x_i=\phi (y_i)$. All partitions of [A,B] are obtained this way. Since the values taken by f on $[x_{i-1},x_i]$ are exactly the same as those taken by g on $[y_{i-1},y_i]$, we see that $U(Q,g,\beta)=U(P,f,\alpha)$ and $L(Q,g,\beta)=L(P,f,\alpha)$. Since f is integrable with respect to $\alpha$, P can be chosen so that both $U(P,f,\alpha)$ and $L(P,f,\alpha)$ are close to $\int f d \alpha$. Hence with g is integrable with respect to $\beta$ and the proof is complete.
The proof of this theorem is getting me here. Can someone give me a big picture of this proof?
Especially this statement "P can be chosen so that both $U(P,f,\alpha)$ and $L(P,f,\alpha)$ are close to $\int f d \alpha$" is not clear to me.
We can choose partition $P$ such that the upper and lower Riemann-Stieltjes sums $U(P, f, \alpha)$ and $L(P, f, \alpha)$ are arbitrarily close to $\int f d \alpha$ because $f$ is assumed to be integrable. Recall the upper and lower integrals are defined as \begin{equation*} \overline{\int} f d \alpha = \inf_P U(P, f, \alpha), \qquad \underline{\int} f d\alpha = \sup_P L(P, f, \alpha) \end{equation*} where $f$ is integrable by definition if and only if the upper and lower integrals agree. By definition of the supremum and infimum, we can find $P_1$ and $P_2$ so that $U(P_1, f, \alpha)$ is close (say, $\epsilon$ close) to the upper integral and $L(P_2, f, \alpha)$ is close to the lower integral.
If I recall correctly, Rudin proves earlier in the chapter on integration that the Riemann-Stieltjes sums play nicely with refinements, i.e. \begin{equation*} L(P_i, f, \alpha) \leq L(P_1 \cup P_2, f, \alpha) \leq U(P_1 \cup P_2, f, \alpha) \leq U(P_i, f, \alpha) \end{equation*} for $i = 1, 2$. This tells us that we can work optimally by choosing a single partition so that the corresponding upper and lower sums are simultaneously good approximations of the upper and lower integral, namely choose $P = P_1 \cup P_2$. Since by definition $f$ is integrable if and only if the upper and lower integrals agree, this shows that we can always choose $P$ so that the upper and lower sums w.r.t $P$ are close to the integral of $f$.
From here it is pretty straight forward, though if you enjoy nauseous detail, here it is;
Choose a partition $Q$ of $[A, B]$ so that the upper and lower sums of $g$ w.r.t $\beta$ are close to the upper and lower integrals of $g$, say $\epsilon$, this corresponds to a partition $P$ of $[a, b]$ s.t. $U(Q, g, \beta) = U(P, f, \alpha)$ and similarly for the lower sums. Since $\phi$ is a homeomorphism, in particular it has continuous inverse, it follows from the Cauchy criterion (also proved earlier in Rudin IIRC), we can choose $Q$ with sufficiently small mesh so that $U(P, f, \alpha)$ is $\epsilon$-close to the integral of $f$, and similarly for the lower. Then \begin{equation*} \int f d\alpha - \epsilon \leq L(P, f, \alpha) = L(Q, g, \beta) \leq \underline{\int} g d \beta \leq L(Q, g, \beta) + \epsilon = L(P, f, \alpha) + \epsilon \leq \int f d \alpha + \epsilon. \end{equation*} The first inequality is because $Q$ was chosen so that the corresponding $P$ has sufficiently small mesh so that $L(P, f, \alpha)$ is $\epsilon$-close to the integral of $f$. The third inequality is similar. The rest of the steps are just by definition. Argue similarly for the upper sums and upper integral. Since $\epsilon$ was arbitrary, we conclude equality.