I was reading Tao's note on probability theory https://terrytao.wordpress.com/2015/10/03/275a-notes-1-integration-and-expectation/comment-page-1/#comment-681077, where the following definition appears:
(Change of variables formula) Let ${X}$ be a random variable taking values in a measurable space ${R = (R, {\mathcal B})}$. Let ${f: R \rightarrow {\bf R}}$ or ${f: R \rightarrow {\bf C}}$ be a measurable scalar function (giving ${{\bf R}}$ or ${{\bf C}}$ the Borel ${\sigma}$-algebra of course) such that either ${f \geq 0}$, or that ${{\bf E} |f(X)| < \infty}$. Then
$\displaystyle {\bf E} f(X) = \int_R f(x)\ d\mu_X(x)$.
Here $\mu_X$ denotes the law of of $X$, i.e. the push-forward of the probability measure ${{\bf P}}$ on the sample space ${\Omega}$ by the model ${X_\Omega: \Omega \rightarrow R}$ of ${X}$ on that sample space.
Then the note proceeds to conclude that If ${X}$ is a scalar variable that only takes on at most countably many values ${x_1,x_2,\dots}$, the change of variables formula tells us that
$\displaystyle {\bf E} X = \sum_i x_i {\bf P}(X=x_i)$
if ${X}$ is unsigned or absolutely integrable. This seems intuitively natural enough, but I'm a bit confused to see how do we directly obtain $\displaystyle {\bf E} X = \int_{\bf R} x\ d\mu_X(x) = \sum_i x_i {\bf P}(X=x_i)$ from the definition.
Let $\{x_i\}_{i\in I}$ be the image of the random variable $X$. We can decompose the sample space $\Omega=\bigcup\limits_{i\in I}\Omega_i$, where $\Omega_i:=\{\omega\in\Omega\,:\, X(\omega)=x_i\}$, and the collection of sets $\{\Omega_i\}_{i\in I}$ is pairwise disjoint. Since $\Omega=\bigcup\limits_{i\in I}\Omega_i$ and the sets are pairwise disjoint, we can write \begin{align} \sum_{i\in I}\chi_{\Omega_i}=1, \end{align} where the RHS denotes the constant function $1$ on $\Omega$. So far I used only basic set theory, no probability theory/measure theory.
Now we come to expectations. Suppose the index set $I$ is at most countable (i.e the range of $X$ is at most countable). Then, for an absolutely integrable random variable, we have \begin{align} \Bbb{E}(X)&:=\int_{\Omega}X\,dP\\ &=\int_{\Omega}\sum_{i\in I}\chi_{\Omega_i}X\,dP\tag{since $\sum_{i\in I}\chi_{\Omega_i}=1$}\\ &=\sum_{i\in I}\int_{\Omega}\chi_{\Omega_i}X\,dP\tag{$*$}\\ &=\sum_{i\in I}\int_{\Omega}\chi_{\Omega_i}\cdot x_i\,dP\\ &=\sum_{i\in I}x_i\cdot \int_{\Omega}\chi_{\Omega_i}\,dP\\ &=\sum_{i\in I}x_i\cdot P(\Omega_i), \end{align} and this is exactly what we wanted to show. Note that in $(*)$, we had to swap the summation with the integral; this can be justified by either dominated convergence or Fubini’s theorem. But keep in mind that to apply these theorems here, we need $X$ to be absolutely integrable (or non-negative) and $I$ to be at most countable (essentially, this traces back to the fact that measures are only countably additive).
If you want to argue using the language of the push-forward measure, we can do that as well. Simply note that $\mu_X(\Bbb{R}\setminus\{x_i\}_{i\in I})=P(X^{-1}(\Bbb{R}\setminus\text{image}(X)))=P(\emptyset)=0$. Thus, \begin{align} \Bbb{E}(X)=\int_{\Bbb{R}}x\,d\mu_X(x)=\int_{\{x_i\}_{i\in I}}x\,d\mu_X(x)=\sum_{i\in I}\int_{\{x_i\}}x\,d\mu_X(x)=\sum_{i\in I}x_i\mu_X(\{x_i\})=\sum_{i\in I}x_i\cdot P(\Omega_i). \end{align} Once again, the third equal sign is where the at-most countability of the image of $X$, and the integrability/non-negativity of $X$ comes into play.