It's been over 25 years since my last course in probability, so this may be obvious or elementary. However, I've unexpectedly had to think deeply about this stuff for a research project, and I cannot reconcile my issue.
For the sake of simplicity/common ground, let's assume Riemann integrals always suffice, there are no convergence issues, and my random variables take values over all of $\mathbb{R}$. All integrals below are $\int_\mathbb{R} \ dx $ whether or not the domain is explicitly written.
Given a r.v. $X$ with density function $f(x)$, we define $E(X) = \int x f(x) \ dx$. Linearity is then proven, $E(aX+b)=aE(X)+b$, and this is completely sensible. In deriving the alternative formula for the variance, we bump into $E(X^2)$. In that derivation, we define $$ E(X^2) = \int x^2 f(x) \ dx $$ and proceed. This is fine, but here's my issue.
$X^2$ is itself a random variable, so we could ask for its expected value (in reference to "itself," not $X$). To be clearer, we could set $Y=X^2$ and ask for $E(Y)$. This requires us to know the density function for $Y$, and this to me is not clear at all and non-trivial to get your hands on. So, setting $Y=X^2$ with density $h(y)$, why is it true that
$$
\int y h(y) \ dy = \int x^2 f(x) \ dx
$$
so that $E(Y) = E(X^2)$? Why do these two very different interpretations agree? I do not think this is as simple as a $u$-substitution.
For example, I can see that this works out fine if $X$ is standard normal. There, $E(X)=0$ and $Y=X^2$ is $\chi^2$ distributed with $1$ degree of freedom. Since $\sigma_X^2=1$ it is clear that $E(X^2)=1$. Chasing the calculations I also see that $E(Y)=E(\chi_1^2)=1$, so they are in fact in agreement. I can even follow the derivations of the $\chi_1^2$ distribution in terms of the $\Gamma$-function and see the connection to the standard normal, but I see no reason for this to play out as nicely no matter the density of $X$. It also seems to get worse when considering $E(X^\alpha)$ in general.
Are these two viewpoints in potential disagreement, or is there a piece of theory that says that there is no ambiguity?
Pretty sure this can be reconciled with a bit of measure theory, but here's a concrete working out that they agree in this particular case.
Let $F$ be the cdf for $X$, so that the probability that $a \le X \le b$ is $F(b)-F(a)$. Then $f=F'$. Now can we use $F$ to work out the cdf for $Y$?
Yes we can! The cdf for $Y$, $H$ will be $H(y) = P(Y\le y)=P(X^2\le y)$. Thus if $y\le 0$, $H(y)=0$. However if $y > 0$, then $P(X^2 \le y) = P(-\sqrt{y}\le x \le \sqrt{y})=F(\sqrt{y})-F(-\sqrt{y})$.
Taking the derivative, we see that $$h(y) = \begin{cases} 0 & y \le 0 \\ \frac{f(\sqrt{y})+f(-\sqrt{y})}{2\sqrt{y}} & y > 0\end{cases}.$$
Then $$\int y h(y) \,dy = \int_0^\infty y(f(\sqrt{y})+f(-\sqrt{y}))\frac{1}{2\sqrt{y}}\,dy.$$ Letting $u=\sqrt{y}$, we have $du=\frac{1}{2\sqrt{y}}\,dy$ so $$\int y h(y) \,dy = \int_0^\infty u^2 (f(u)+f(-u))\,du=\int_{-\infty}^\infty u^2 f(u)\,du=E[X^2]$$
And the abstract approach
Let $(A,\Omega,\mu)$ be a probability space. A random variable on $A$ is a measurable function $f:A\to \Bbb{R}$. The random variable $Y=X^2$ on $A$ is simply the measurable function $a\mapsto f(a)^2$, the composite of $x\mapsto x^2$ with $f$. However, let's be a little more general. Let $g:\Bbb{R}\to \Bbb{R}$ be any continuous function on $\Bbb{R}$, and we can consider the random variable $Y=g(X)$, which is the function $g\circ f : A\to \Bbb{R}$.
Now we can take the pushforward $\mu$ along $f$ to get a measure $f_*\mu$ on $\Bbb{R}$ defined by $f_*\mu(E) = \mu(f^{-1}(E))$. If $f_*\mu$ is absolutely continuous with respect to Lebesgue measure, then its Radon-Nikodym derivative will be the pdf of $X$, but let's not think about distribution functions for now.
We now have a new measure space, $(\Bbb{R},\mathcal{B},f_*\mu)$. Since we obtained this space by pushing forwards $\mu$ along $f$, we see that the identity function on this new space determines the same random variable $X$, since the probability that $X$ ends up in some set $B\subseteq\Bbb{R}$, by definition originally was $\mu(f^{-1}(B)$, but this is $f_*\mu(\mathrm{id}^{-1}(B))$.
Similarly, $Y=g(X)$ will be described on this new space by simply the function $g:\Bbb{R}\to\Bbb{R}$.
The expected value of $Y$ is then $$\int g(x)\,f_*\mu(dx),$$ however, we can then pushforward $f_*\mu$ along $g$ to get $g_*f_*\mu$. This gives that the expected value of $Y$ is $$\int y\, g_*f_*\mu(dy).$$
Thus, rephrased in abstract language, the statement that you want is that these two values are equal, i.e. $$\int g(x)\,f_*\mu(dx)=\int y\,g_*f_*\mu(dy).$$
This is however, just a special case of change of variables, which says that $$\int j \circ k \,d\nu = \int j \,d k_*\nu,$$ with $j=\textrm{id}$, $k=g$, and $\nu=f_*\mu$.