I am trying to understand the Bayesian Information Criterion (BIC) using this article. At page 2 the following equality is given:
$$P(y|M_1) = \int f(y|\theta_i)g_i(\theta_i)d\theta_i$$
with
- $y$ : observed data $y1, . . . , yn$
- $M_i$ : a candidate model
- $P(y|Mi)$ : marginal likelihood of the model $M_i$ given the data
- $θ_i$: vector of parameters in the model $M_i$
- $g_i(θ_i)$ : the prior density of the parameters $θ_i$
- $f(y|θ_i)$ : the density of the data given the parameters $θ_i$
I do not understand the presented equality. I have tried to understand it for a couple of hours, but I was not even able to figure out the exact (conceptual) meaning of the product $f(y|\theta_i)g_i(\theta_i)$, let alone its integration. I know I am suppose to give info on the things I have tried to do, but given that this is not a practical problem I do not know what else to say.
Thanks in advance.
The integral is a definite integral across the whole domain of $y$. This is not specified in the paper but it is specified in other books that treat the same topic.
The integral $\int f(y|\theta_i)g_i(\theta_i)$ reflects the probability of $y$ provided our model parameters $\theta_i$ weighted by our prior belief in the model: $g_i(\theta_i)$. So basically we define $P(y|M_1)$ as the probability of $y$ under the parameters of the model weighted by the uncertainty of the model. Which is in deed a very Bayesian approach.