Bayesian Statistics - Basic question about prior

66 Views Asked by At

I try to get an understanding of bayesian statistics. My intuition tells me that in the expression for the posterior

$$p(\vartheta|x) = \frac{p(x|\vartheta)p(\vartheta)}{\int_\Theta p(x|\theta)p(\theta) d\theta}$$

the term $p(\vartheta)$ is the marginal distribution of the likelihood-function $p(\vartheta,x)$. It is obtained by $$p(\vartheta)=\int_X p(\vartheta|x)p_X(x)dx$$ where $p_X(x)$ should be the marginal distribution of the Observable data. Does that make sense?

To this point it makes sense with this example: Offering somebody a car insurance without knowing the person's style of driving (determined by $\vartheta \in \Theta$) to feed some statistical model, we still can make use of the nation's car-crash statistics as our prior, which is a pdf on $\Theta$. That would be the marginal distribution of the "driving styles" across the population.

Maybe I am just oversimplifying here, because my resources did not mention this.

2

There are 2 best solutions below

4
On BEST ANSWER

In the Bayesian way of thinking, the prior distribution has no dependence on the data, so trying to get $p\left(\vartheta\right)$ by integrating over $x$ is an incorrect way to think about it. The distribution $p\left(\vartheta\right)$ exists first -- it represents what you believe about the $\vartheta$ parameter (i.e., which values it is more or less likely to have) before having seen any data. Then, you observe $x$ and update your beliefs to the conditional distribution $p\left(\vartheta\,|\,x\right)$.

But from which distribution do you observe $x$? This distribution will be different depending on what $\vartheta$ is. So, it only makes sense to talk about $p\left(x\right)$ as an expectation over different $\theta$ values, as in the denominator of the Bayesian update.

0
On

I suggest you to change the point of view in order to enter in the bayesian way of thinking:

Think at the posterior in the following way:

$$\pi(\theta|\mathbf{x})\propto \pi(\theta)\times p(\mathbf{x}|\theta)$$

Where the prior is a distribution that include all the information you have on your parameter multiplied (corrected) by the likelihood which include all the information given by the data.

As per the fact that $\pi(\theta)\times p(\mathbf{x}|\theta)$ could not be a distribution (because its integral could be $\ne 1$) you have to normalize it multiplying the previous product by a constant.

I think that this basic topic could be useful for you.