In the change of variables formula (Theorem 1.6.9), he writes:
"Let $X$ be a random element of $(S,\mathcal{S})$ with distribution $\mu$, i.e., $\mu(A) = P(X\in A)$. If $f$ is a measurable function form $(S,\mathcal{S})$ to $(\mathbb{R},\mathcal{R})$ so that $f\ge 0$ or $E|f(X)| < \infty$, then $Ef(X) = \int_Sf(y)\mu(dy)$."
Some context: The default probability space is denoted $(\Omega,\mathcal{F},P)$ ($\Omega$ sample space, $\mathcal{F}$ $\sigma$-algebra, $P$ probability measure), and $\mathcal{R}$ is the Borel $\sigma$-algebra on $\mathbb{R}$.
My basic question is: What does he mean by $\int_S f(y)\mu(dy)$? I assume he must mean: take the integral, over $S$ of $f : S\rightarrow\mathbb{R}$, with respect to the measure $\mu$ on $(S,\mathcal{S})$. However, in previous sections, he has also written $\int_S f(y)d\mu$ to denote the same thing. Why does he use $\mu(dy)$ vs $d\mu$? What does $dy$ even mean? (here $y\in S$) In previous chapters he's only used $dy$ or $dx$ in the case where $y,x\in\mathbb{R}$. Is there any subtle distinction that I'm missing?