For any two discrete random variables $X,Y:\Omega \to \mathbb{R}$ in a probability space $(\Omega, \mathcal{F}, \mathbb{P})$, linearity of expectation tells us that: $$\mathbb{E}(X+Y) := \sum_{t \in (X+Y)(\Omega)}t\cdot\mathbb{P}(X+Y = t) = \mathbb{E}X + \mathbb{E}Y$$
Here I use $(X+Y)(\Omega)$ to denote the image of $X + Y$ and $X + Y = t$ to denote the event $\{\omega \in \Omega\,\mid\, X(\omega) + Y(\omega) = t\}$. In university lecture notes, textbooks, and online forums (like this one) you see a lot of proofs of this fact that begin something like this: $$\mathbb{E}(X + Y) := \sum_{x \in X(\Omega)} \sum_{y \in Y(\Omega)} (x + y)\cdot\mathbb{P}(X = x, Y = y)$$ How in any way does this coincide with the definition of expectation that I gave above? I realize that there is an equivalent definition (namely, $\mathbb{E}X = \sum_{\omega \in \Omega} X(\omega) \mathbb{P}(\{\omega\})$) that trivializes the proof. The way I prefer to prove this fact is to first prove equivalence between the two definitions and then use the second definition to prove linearity. However, I find that the above method of proof is very common in other areas of probability theory aside from just linearity of expectation, so I'd like to better understand it.
To me this feels a lot like the Law of the Unconscious Statistician; the first step of the proof is actually a large jump in reasoning but, given little thought, appears intuitively true (though it is currently very unintuitive to me). How do I go above proving equivalence between the two sums? Is this true in other scenarios? For example: $$\mathbb{E}(XY) := \sum_{t \in (XY)(\Omega)} t \cdot \mathbb{P}(XY = t) \stackrel{??}{=} \sum_{x \in X(\Omega)} \sum_{y \in Y(\Omega)} xy\cdot\mathbb{P}(X = x, Y = y)$$
Does the above hold? Help is appreciated, thanks in advance.
$\newcommand{\R}{\mathbf R}$ $\newcommand{\E}{\mathbb E}$
It will be useful to be acquainted with the notion of push-forward of a (probability) measure. I will be stating things for finite spaces but everything can be done in full generality.
Let $\Omega_1$ and $\Omega_2$ be two finite sets and $\mu$ be a probability measure on $\Omega_1$. Let $f:\Omega_1\to \Omega_2$ be any map. We define the push-forward $\mu$ under $f$ as a probability measure $\nu$ defined as $$ \nu(B_2) = \mu(f^{-1}(B_2)) $$ for all subsets $B_2$ of $\Omega_2$. The push-forward of $\mu$ by $f$ will be denoted by $f_*\mu$. Note that the push-forward behaves well under composition. More precisely, if $f:\Omega_1\to \Omega_2$ and $g:\Omega_2\to \Omega_3$ are two maps, and $\mu$ is a probability measure on $\Omega_1$ then $(g\circ f)_*\mu = g_*(f_*\mu)$.
It is an easy verification that if $Y:\Omega_2\to \R$ is a map, and $f:\Omega_1\to \Omega_2$ is any map and $\mu$ is a probability measure on $\Omega_1$, then $$ \E_\mu[Y\circ f] = \E_{f_*\mu}[Y] $$
With this in hand, suppose we have a probability space $(\Omega, \mu)$ and two random variables $X, Y:\Omega\to \R$. Define a map $Z:\Omega\to \R\times \R$ as $Z(\omega)=(X(\omega), Y(\omega))$ and let $a:\R\times \R\to \R$ be the addition map, that is, $a(x, y)=x+y$. Now $\E_\mu[X+Y]= \E_\mu[a\circ Z] = \E_{Z_*\mu}[a]$. But $$ \E_{Z_*\mu}[a] = \sum_{(x, y)\in \text{image}(Z)} a(x, y) (Z_*\mu)(x, y) = \sum_{(x, y)\in \text{image}(Z)} (x+y)\mu(X=x, Y=y) $$ Note that one can replace $\text{image}(Z)$ above by anything that contains $\text{image}(Z)$, in particular, one can write $\text{image}(X)\times \text{image}(Y)$ in place of $\text{image}(Z)$. So we have $$ \E_{\mu}[X+Y] = \sum_{(x, y)\in \text{image}(X)\times \text{image}(Y)} (x+y)\mu(X=x, Y=y) $$
One can play the same game to obtain a similar expression for $\E_\mu[XY]$. Just replace the addition map $a$ by the multiplication map $(x, y)\mapsto xy:\R\times \R\to \R$.