Does the parameter change during data generation in Bayesian Inference?

107 Views Asked by At

Let's assume that we have the following graphical model:

enter image description here

This graph encodes the joint distribution $P(p,x_1,x_2,x_3,x_4) = P(p)\prod_{i=1}^{4}P(x_i|p)$. In the Bayesian inference, if we know $x_1,x_2,x_3$ then a full Bayesian predictive posterior for $x_4$ is given as:

$$ P(x_4|x_3,x_2,x_1) = \int_{p}P(x_4|p)P(p|x_3,x_2,x_1)dp$$

I used to interpret a model like in the above as follows: Some probabilistic system picks a $p$ from $P(p)$. Then this picked $p$ generates data as $x_1\sim P(x_1|p),x_2\sim P(x_2|p),x_3\sim P(x_3|p)$. We do not know the true value of $p$ and therefore we integrate over all possible values of $p$ using its posterior distribution given the data in order to obtain the posterior predictive distribution of $x_4$.

In this interpretation, the value of $p$ stays the same once it is picked from the prior, during the data generation. But our lack of knowledge about it leads us to integrate over its all possible values in order to infer $x_4$. This can be considered as a generative model.

My question is, is my interpretation correct here? I am asking this, because I have seen some sources on web which imply that $p$ changes during generating $x_i$s. If it were that way, shouldn't each of these different values be named as different random variables like $p_1$ for $x_1$, $p_2$ for $x_2$, etc. ?

This has greatly confused me. I appreciate any comments.

2

There are 2 best solutions below

4
On BEST ANSWER

In general, yes, your interpretation is correct. Bayesian models support a generative interpretation, unlike classical frequentist models (a grey area is mixed-effects modeling).

0
On

I am posting my opinion about this, because I had the same question. I looked into page 83 of "Statistical Rethinking: A Bayesian Course with Examples in R and Stan," by Richard McElreath (1st Edition), and found the following code:

sample_mu <- rnorm(1e4, 178, 20)
sample_sigma <- runif(1e4, 0, 50)
prior_h <- rnorm(1e4, sample_mu, sample_sigma)
dens(prior_h)

Basically this code samples 10,000 heights from a normal distribution with mean mu and standard deviation sigma, where mu follows a normal prior, and sigma also follows a uniform prior.

The way the heights prior_h are sampled, indicates that for each observation of height, a brand new mu and sigma is drawn from the prior.

So to answer your question, I think that yes, each draw from the prior should be labeled with a subscript i=1,2,3,... But I guess this is just not done by convention.