I am trying to solve the following problem:
I believe I have done part (a). For $g_1$, and $g_3$(the first and third transformation shown in the image), it is easy to see that the image, $g_i(U)$ for $i = 1,3$ remains a closed rectangle, and so not much additional work needs to be done.
For $g_2$, here is my "proof" (assuming it works): If $U$ $=$ $\times_{i=1}^n[a_i,b_i]$, and $j<k$ then the integral we want to evaluate becomes:
$\int_{g_2(U)} 1$ = $\int_{a_1}^{b_1} ..\int_{x_j+a_k}^{x_j+b_k}..\int_{a_n}^{b+n} 1 dx_n...dx_1$ $ = |U|$ by Fubini's theorem (proven earlier in the book). Since the determinant of $g_2$ is one, we are done.
Alternatively one can see that this holds in $\mathbb{R^2}$ and $\mathbb{R^3}$ geometrically, as the transformation turns rectangles into parallelograms, and I suppose this argument can be generalized to $\mathbb{R}^n$, but I was keen on sticking to using an integral to prove all this.
My question is, how can we proceed to do (b)?
If $g_2$ didn't change our rectangles into parallelograms, (that is, if $g_2(U)$ was not a parallelogram, rather than a rectangle) we could iteratively apply (a) and the multiplicativity of the determinant to solve our problem, but given $g_2$ modifies the shape of our rectangle, we need to try something else. (We can't just use spivak's hint and repeatedly apply (a) in this case!!)
It would be greatly appreciated if someone could elucidate how (b) could be solved, I have thought about using the following line of reasoning:
$g_2$ gives us a parallelogram from a rectangle, which can be decomposed into a closed rectangle and two congruent triangles (which together "form" a rectangle), now we have two rectangles, and we can keep applying (a). But this isn't precise, if someone could either help make this precise or provide a full solution themselves that would be fantastic. If the solution could only use things proven in Spivak (before partitions of unity, chapter 3) that would be even better. So far he has proved the inverse/implicit function and rank theorems, provided a criterion for Riemann integrability of functions over closed rectangles and arbitrary Jordan measurable sets, provided a definition of a Jordan measurable set and the notions of almost everywhere in the form of "content" 0 and "measure" 0 (which turns out to coincide with the Lebesgue measure).
Thank you for reading.

A serious hint, not a solution.
This is a serious hint, one that can be reduced to saying "You should read the hint given in the problem itself; Spivak his helping you out here!"
You know from the hint that $g$ can be written $$ g = h_1 \circ h_2 \circ h_3 \circ \cdots \circ h_k $$ where each $h_i$ is a linear transformation of the form given.
You know that if $g = p \circ q$, and all are linear transformation on $\Bbb R^n$, then $\det g = (\det p) \cdot (\det q)$, right?
You know that for each $h_i$, you have the volume of $h_i(U)$ is $|\det h_i|$ times the volume of $U$.
What do you get when you combine these three facts to attempt to compute the volume of $g(U)$ for some rectangle $U$?
Post-comment addition
As OP points out, I really set about answering the wrong question. The real question was, in essence, "Why is it that when $h$ and $g$ both transform a rectangle to a parallelogram, that $h \circ g$ does as well?"
First of all, I want to concentrate on parallelograms with one vertex at the origin. Suppose that $E$ is a parallelogram with one vertex at $P$. Then $E' = E-P$ (i.e., subtract $P$ from every point in $E$ to get a new set $E'$) is a parallelogram with one vertex at the origin. And if $T$ is a linear transformation, and $Q \in E$, we can write $$ T(Q) = T(Q-P) + T(P) $$ by linearity. So \begin{align} \{T(Q) \mid Q \in E \} &= \{T(Q-P) + T(P) \mid Q \in E \} \\ &= T(P) + \{T(Q-P) \mid Q \in E \} \\ &= T(P) + \{T(S) \mid S \in E' \} \end{align} In short, we can understand how $T$ transforms the set $E$ by looking at how it transforms $E'$, and then adding a constant vector $T(P)$ to the result. Thus I've reduced the problem of showing that "$T$ takes parallelograms to parallelograms" to the problem of showing that "$T$ takes parallelograms at the origin to parallelograms at the origin," where "at the origin" means "with one vertex being the origin."
The next problem is to figure out what exactly is a parallelogram at the origin. By looking at the 2D and 3D cases, we can observe that in $\Bbb R^n$, there will be $n$ edges of the parallelogram leaving the origin (or any other vertex for that matter). Let's call the vectors from the origin to the remote ends of those edges $v_1, \ldots, v_n$, OK?
Now you have to think a little, and realize that once you have, for some parallelogram $S$ (just to give it a name) those $n$ vectors, the set of all points in $S$ is exactly the set of linear combinations $$ c_1v_1 + c_2 v_2 + \ldots + c_n v_n $$ where $0 \le c_i \le 1$ for all $i = 1, \ldots, n$. In fact, let's give that set a name. I'm going to define $$ Y(v_1, v_2, \ldots, v_n) = \{c_1v_1 + c_2 v_2 + \ldots + c_n v_n \mid 0 \le c_i \le 1, i = 1, \ldots, n\}. $$
I'm going to hope that you believe those last two claims (namely that $n$ edges meet at the origin, and that linear combinations of those, with $0$-to-$1$ coefficients, constitute the entire parallelogram), but if you don't, by all means tell me. There's one last bit: I also claim that not only can every parallelogram at the origin be described this way, but every set described this way is also a parallelogram. (Interesting things happen when the vectors are linearly dependent, so this isn't completely trivial. As a first step, it requires a really clear definition of "parallelogram"!)
Now what I have to do is to show you what happens when I linearly transform one such set, I get another; then I'll have shown that linear transformations take parallelograms at the origin to parallelograms at the origin (and hence that they take parallelograms anywhere to other parallelograms), so that you really can do part "b" of the problem just as Spivak asks.
So: let's look at a linear transformation $T$, applied to the set $Y(v_1, \ldots, v_n)$. We get \begin{align} &T(Y(v_1, \ldots, v_n)) \\ &= T\left( \{c_1v_1 + c_2 v_2 + \ldots + c_n v_n \mid 0 \le c_i \le 1, i = 1, \ldots, n\}\right)\\ &= \{T(c_1v_1 + c_2 v_2 + \ldots + c_n v_n) \mid 0 \le c_i \le 1, i = 1, \ldots, n\}\\ &= \{T(c_1v_1) + T(c_2 v_2) + \ldots + T(c_n v_n) \mid 0 \le c_i \le 1, i = 1, \ldots, n\} & \text{by linearity}\\ &= \{c_1T(v_1) + c_2T( v_2) + \ldots + c_nT( v_n) \mid 0 \le c_i \le 1, i = 1, \ldots, n\} & \text{by linearity again}\\ &= \{c_1 w_1 + c_2 w_2 + \ldots + c_n w_n \mid 0 \le c_i \le 1, i = 1, \ldots, n\} & \text{where $w_i = T(v_i)$ for each $i$}\\ &= Y(w_1, \ldots, w_n). \end{align} So the parallelogram generated by the $v$ vectors has become a parallelogram generated by the $w$ vectors, where each $w_i$ is just $T(v_i)$.
And I think that's the end of the story.