I'm trying to understand this part of Statistical Inference(Casella, Berger) regarding expressing the joint pdf of non-bijective transformations.
More specifically, what is the intuition behind (4.3.6)? I understand the case of one-two-one transformations, but here, I cannot understand why the joint pdf becomes a summation.
Thanks in advance!

Let's take a relatively simple example, say $(X,Y)$ with a uniform distribution on $[-3,7]^2$ so with density $f_{X,Y}(x,y)=\frac1{100}$ on that support, and look for the distribution $(U,V)$ where $U=|X|$ and $V=|Y|$. This simple example avoids worrying about transforming the density with the Jacobian and concentrates on the addition issue
Clearly the support for $(U,V)$ is $[0,7]^2$. You might intuitively expect $(1,1)$ to be more likely to occur than $(6,6)$ as there more ways the former might happen. If so, you would be correct, and the calculation is as follows
Looking for inverses, $0 \lt U \le 3 \implies X=U$ or $X=-U$ while $3 \lt U \le 7 \implies X=U$, and similarly for $V$ and $Y$. So finding a joint density function for $(U,V)$ may involve adding up four, two or one densities
Thus
for $0 \lt u \le 3$ and $0 \lt v \le 3$, we have four densities to add up, covering the inverses $(x,y)=(u,v)$, $(x,y)=(-u,v)$, $(x,y)=(u,-v)$ and $(x,y)=(-u,-v)$, and making $f_{U,V}(u,v)=\frac{4}{100}$
for $0 \lt u \le 3$ and $3 \lt v \le 7$, we have two densities to add up, covering the inverses $(x,y)=(u,v)$ and $(x,y)=(-u,v)$, and making $f_{U,V}(u,v)=\frac{2}{100}$
for $3 \lt u \le 7$ and $0 \lt v \le 3$, we have two densities to add up, covering the inverses $(x,y)=(u,v)$ and $(x,y)=(u,-v)$, and making $f_{U,V}(u,v)=\frac{2}{100}$
for $3 \lt u \le 7$ and $3 \lt v \le 7$, we have one density to add up, covering the inverse $(x,y)=(u,v)$, and making $f_{U,V}(u,v)=\frac{1}{100}$
Just as a check that nothing is missing, the total probability described by this density on this support would be $\int_u\int_v f_{U,V}(u,v) \,dv \, du= 9\times\frac{4}{100} + 12\times\frac{2}{100} + 12\times\frac{2}{100} + 16\times\frac{1}{100} =1$ as you would hope
Intuitively, where there are multiple inverses, each one can contribute to the final density for $(U,V)$, with each adding to the final result