My textbook, Rosen's Discrete Mathematics and its Applications, 8th Edition, offers the following proof of a theorem regarding independent random variables:
If $X$ and $Y$ are independent random variables on a sample space $S$, then $E(XY)=E(X)(Y)$.
Proof: To prove this formula, we use the key observation that the event $XY=r$ is the disjoint union of the events $X=r_1$ and $Y=r_2$ over all $r_1\in X(S)$ and $r_2 \in Y(S)$ with $r=r_1r_2$. We have $$\begin{align}E(XY) &=\sum_{r\in XY(S)}r\cdot p(XY=r) && \text{by the definition of $E(XY)$}\\ &=\sum_{r_1\in X(S),r_2\in Y(S)}r_1r_2\cdot p(X=r_1\text{ and }Y=r_2) && \text{expressing $XY=r$ as a disjoint union}\\&=\sum_{r_1\in X(S)}\sum_{r_2\in Y(S)}r_1r_2\cdot p(X=r_1\text{ and }Y=r_2) && \text{using a double sum to order the terms}\\&=\sum_{r_1\in X(S)}\sum_{r_2\in Y(S)}r_1r_2\cdot p(X=r_1)\,\cdotp(Y=r_2) && \text{by the independence of $X$ and $Y$}\\&=\sum_{r_1\in X(S)}\bigl( r_1\cdot p(X=r_1)\cdot \sum_{r_2\in Y(S)}r_2\cdot p(Y=r_2)\bigr) && \text{by factoring out $r_1\cdot p(X=r_1)$}\\&=\sum_{r_1\in X(S)} r_1\cdot p(X=r_1)\cdot E(Y) && \text{by the definition of $E(Y)$}\\&=E(Y)\bigl(\sum_{r_1\in X(S)} r_1\cdot p(X=r_1)\bigr) && \text{by factoring out $E(Y)$}\\&=E(Y)E(X) && \text{by the definition of $E(X)$}\\[10pt]&=E(X)E(Y) && \text{by the commutative property of multiplication.}\end{align}$$
I'm a little confused as to what is meant by "the disjoint union of the events $X=r_1$ and $Y=r_2$ over all $r_1\in X(S)$ and $r_2 \in Y(S)$ with $r=r_1r_2$". First of all, I'm pretty sure that $r_1\in X(S)$ is a mild abuse of notation, as the random variable $X$ is a function, not a set. I can rationalize the second step of the proof just given my intuitive understanding of the conditions that must be met in order for the event $XY=r$ to have occurred ($X$ must equal some $r_1$ at the same time that $Y$ equals an $r_2$ such that $r_1r_2=r$). Can somebody help me relate the concept of a "disjoint union" to this step of the proof? From what I can find online, a disjoint union of two sets is basically just their union, but with their elements reframed as ordered pairs so that each element can be traced back to its original set once the two sets have been mashed together (https://en.wikipedia.org/wiki/Disjoint_union). If my understanding is accurate, I'm not sure why Rosen invokes the concept of a disjoint union in constructing this proof.
Let $P=\{\langle r_1,r_2\rangle:r_1r_2=r\}$. For each $\langle r_1,r_2\rangle\in P$, the combined event $X=r_1$ and $Y=r_2$ is one way to get the event $XY=r$. These combined events are mutually exclusive and exhaust the possibilities for getting $XY=r$, so
$$p(XY=r)=\sum_{\langle r_1,r_2\rangle\in P}p(X=r_1\text{ and }Y=r_2)\;.$$
That’s all that Rosen means when he says that the event $XY=r$ is the disjoint union of these combined events: they are mutually exclusive and exhaust all possibilities for getting the event $XY=r$.