I was reading Lemma 1.22 in Kallenberg's Foundations of Modern Probability. It reads as follows:
Lemma 1.22 (substitution). Fix a measure space $(\Omega, \mathcal{A}, \mu)$, a measurable space $(S, \mathcal{S})$, and two measurable mappings $f: \Omega \rightarrow S$ and $g: S \rightarrow \mathbb{R}$. Then $$ \mu(g \circ f)=\left(\mu \circ f^{-1}\right) g\quad\quad(4) $$ whenever either side exists. (Thus, if one side exists, then so does the other and the two are equal.)
Proof: If $g$ is an indicator function, then (4) reduces to the definition of $\mu \circ f^{-1}$. From here on we may extend by linearity and monotone convergence to any measurable function $g \geq 0$. For general $g$ it follows that $\mu|g \circ f|=\left(\mu \circ f^{-1}\right)|g|$, and so the integrals in (4) exist at the same time. When they do, we get (4) by taking differences on both sides.
I understood that this holds when $g$ is an indicator function. But how do you extend to nonnegative measurable functions? Also in the last sentence, what is the meaning of taking differences?
If $g$ is a simple function, then it is the sum of indicator functions; you may now apply linearity of the integral and the above result for indicator functions.
If $g$ is a non-negative measurable function, you may approximate it by simple functions $(s_n)$, so that $s_n \to g$ pointwise almost everywhere. This allows you to apply the monotone convergence theorem.
If $g$ is a measurable function, then you may write $g = g^+ - g^-$ where $g+, g^- \geq 0$ are the positive and negative parts of $g$, respectively. By linearity of integration, you are done.