Suppose that $P \sim f(p\mid \alpha, \beta)$ and that $Q\sim f(q\mid \gamma, \delta)$. Now suppose that the likelihood function for $P,Q$ as a function of data $x$ is given as:
$$ L(p,q\mid x) $$
I am wondering why it is the case that we have that the likelihood of the hyper-parameters $\alpha, \beta, \gamma,\delta$ can be written as:
$$ L(\alpha, \beta, \gamma,\delta\mid x) = \iint L(p,q\mid x)f(p\mid\alpha, \beta)f(q\mid\gamma, \delta)dpdq $$
I saw this statement above in a paper where they said that this was taking the expectation. It seems to me that the above is just marginalizing out $p$ and $q$. However, I am wondering what the explicit form of $L(p,q\mid x)$ is above?
Is it true that:
$$ L(p,q\mid x) = p(x\mid p,q) = \frac{p(x, p, q)}{f(p)f(q)} $$
and hence we have:
$$ \iint L(p,q\mid x)f(p\mid\alpha, \beta)f(q\mid \gamma, \delta) \, dp \, dq = \iint p(x, p, q)\,dp\,dq \text{ ?} $$
This doesn't make sense as I thought that $L(p,q\mid x)$ should be viewed as a function of $p,q$ for fixed $x$? Additionally, why is it now $f(p)$ doesn't contain the hyper-parameter conditional of $f(p\mid \alpha, \beta)$?
Basically we can construct a Bayesian network (a directed acyclic graph, DAG) of the hyperparameters' influence on the parameters and their influence on the random variable: $$\begin{array}{c} \alpha & & \beta & & & & \gamma & & \delta\\ & \searrow & \downarrow &&&& \downarrow &\swarrow \\ && p &&&& q\\ &&& \searrow && \swarrow\\ &&&& x\end{array}$$
From this we can see that:
$$\begin{align}\mathcal L(\alpha,\beta,\gamma,\delta\mid x) ~&=~ f(x\mid \alpha,\beta,\gamma,\delta) \tag 1 \\ &=~ \iint f(x, p,q\mid \alpha,\beta,\gamma,\delta)\operatorname d (p,q) \tag 2 \\ &=~ \iint f(x \mid p,q, \alpha,\beta,\gamma,\delta)\,f(p,q\mid \alpha,\beta,\gamma,\delta) \operatorname d (p,q) \tag 3 \\ &=~ \iint f(x\mid p,q)\,f(p,q\mid \alpha,\beta,\gamma,\delta)\operatorname d (p,q) \tag 4 \\ &=~ \iint f(x\mid p,q)\,f(p\mid \alpha,\beta)\,f(q\mid \gamma,\delta)\operatorname d (p,q) \tag 5 \\ &=~ \iint \mathcal L(p,q\mid x)\,f(p\mid \alpha,\beta)\,f(q\mid \gamma,\delta)\operatorname d (p,q) \tag 6\end{align}$$
(1, 6) by definition of a Likelihood function.
(2) by the Law of Total Probability
(3) by Conditioning
(4) The variable $x$ and hyperparameters $\{\alpha,\beta,\gamma,\delta\}$ are conditionally independent for given parameters $\{p,q\}$. $f(x\mid p,q,\alpha,\beta,\gamma,\delta)=f(x\mid p,q)$.
(5) The subgraph formed by nodes $\{\alpha, \beta, p\}$ is independent of that formed by nodes $\{\gamma,\delta, q\}$