In statistics and machine learning, we often see expressions like (e.g. it is used in [2], written by very important statisticians)
$$\mathbb{E}_q(x) \left[ \log p(x) \right] \tag{0} \label{0} $$
which is apparently supposed to mean
$$\mathbb{E}_q(x) \left[ \log p(X) \right] \tag{1} \label{1}$$
where $X$ is some random variable, because expectations take random variable as inputs and the lower case letter in $ \log p(x)$ inside the expectation (\ref{0}) suggests that $ \log p(x)$ is not a random variable, but $\log p(X) $ is more descriptive and suggestive, and it should indicate that it's a random variable that is the composition of $\log$, $p$ and $X$.
Now, the expectation (\ref{1}) is with respect to the p.d.f. $q$, so we can write it as follows
$$\mathbb{E}_q(x) \left[ \log p(X) \right] = \int q(x) \left( \log p(x) \right) dx$$
Inside the integral, $x$ is a dummy variable, i.e. it's not a random variable or a realization of a random variable.
However, I don't understand what the relationship between
$log p(x) $ inside the integral $\int q(x) \left( \log p(x) \right) dx$, and
the random variable $\log p(X)$ inside the expectation $\mathbb{E}_q(x) \left[ \log p(X) \right]$
is.
Does the random variable $\log p(X)$ have pdf $\log p(x)$? What about $X$? Does it have pdf $q$ or $\log p(x)$, or maybe $p$ (if it's a pdf)?
The answer to this question Can we really compose random variables and probability density functions? (that I asked) says that we can compose random variables and pdfs, but when exactly can we do it?
In short, the fact that $$\mathbb{E}(\log f_X(X))=\int_\mathbb{R} \log (f_X(x)) f_X(x) dx,$$ is just an application of LOTUS and a strict adherence to the convention of uppercase RVs and lower-case text for the values they take on (which not every author equally follows).
Suppose $X$ is a continuous RV with PDF $f_X(x)$. In general, a standard but not always applicable way to find the PDF of a transformation of a random variable $X$, given by $Y=h(X)$ for some Borel function $h:\mathbb{R}\to\mathbb{R}$, is known by the inverse-CDF-method (or CDF transformation method, or...). That is, provided $h$ is nice enough (invertible and with differentiable inverse), then $$f_Y(y)=f_X(h^{-1}(y))(h^{-1}(y))'$$ This follows from $$F_Y(y):=\mathbb{P}(Y\leq y)=\mathbb{P}(h(X)\leq y)$$ $$=\mathbb{P}(X\leq h^{-1}(y))=F_X(h^{-1}(y)),$$ and then using chain-rule. Depending on the specific choice of $h$, the computation of $f_Y(y)$ may be easy or difficult. In the case for entropy computations, we have $$h(x)=\log f_X(x),$$ so that if $f_X(x)$ is invertible, we have $$h^{-1}(y)=f^{-1}_X(e^y),$$ from which we get $$f_Y(y)=e^y (f^{-1}_X(e^y))'$$ where the rest of the computation depends on the nature of $f_X$. A more general method (and in my opinion, better, more systematic) for finding PDFs of transformations is outlined in this answer. Here we have also made the minor assumption that inverting $h$ does not change the inequality direction. For a more general discussion see this wikipedia page in addition to the LOTUS page. This is often called the Jacobian-transformation technique, or something similar. Fortunately, it is not always necessary to know $f_Y(y)$ when $Y=h(X)$ in order to compute $\mathbb{E}(Y)=\mathbb{E}(h(X))$ due to LOTUS, as explained below.
For a general overview:
The following references section 6.12 in D. Williams' Probability with Martingales. In measure-theoretic terms, given some probability triple $(\Omega, \mathscr{F}, \mathbb{P})$, then a mapping $X:\Omega\to \mathbb{R}$ is a random variable if its a measurable function of the sample space and then the expectation (if it exists) is defined by $$\mathbb{E}(X):=\int_\Omega X(\omega) \mathbb{P}(d\omega),$$ (of which, there are many variations of this notation). Of course, we almost never use this for computations.
Instead, if $h:\mathbb{R}\to\mathbb{R}$ is Borel, and we write $\Lambda_X(B):=\mathbb{P}(X\in B)$ for the law of $X$, where $B$ a Borel subset of reals, then $Y=h(X)$ is in $\mathcal{L}^1(\Omega, \mathscr{F}, \mathbb{P})$ if and only if $h\in \mathcal{L}^1(\mathbb{R}, \mathscr{B}, \Lambda_X)$ and then $$\mathbb{E}(h(X))=\int_{\mathbb{R}} h(x) \Lambda_X(dx)$$ which is esssentially LOTUS. When $X$ possesses a density, the measure $\Lambda_X(dx)=f_X(x)dx$ (here $dx$ is really an abuse of notation for $\text{Leb}(dx)$). The proof is in the referenced text and can be outlined as: verify it holds with $h=\mathbb{1}_B$ indicator functions, then use linearity to show it holds for simple-functions, then MCT can be used to show it holds for non-negative Borel $h$ and linearity once more for any Borel $h:\mathbb{R}\to\mathbb{R}$.
Toy Example
I only have time to do a simple example: let $X$ have density $f_X(x)=2x \mathbb{1}_{0<x<1}$ and $Y=\log (f_X(X))$. Then the inverse on $y \in (0,2)$ of $f_X$ is $f_X^{-1}(y)=y/2,$ and by the above formula, $f_Y(y)=\frac 12 e^{2y} \mathbb{1}_{-\infty <y<\log 2}$. So we get $$\mathbb{E}(Y)=\int_{-\infty}^{\log 2} \frac y2 e^{2y} dy =\log 2 - \frac 12=\int_0^1 \log(2x) 2x dx=\mathbb{E}(\log f_X(X)).$$
Sorry for the length, hopefully this is not too rambling (I tried to provide a general answer as well as some specific responses, if you think I should edit it down, feel free to suggest so). Of course, please let me know if you have any questions, comments, or corrections.