I have an understanding question about the Bayes' Theorem: in
$$p(z|x) = \frac{p(x|z)p(z)}{p(x)},$$
the term $p(z)$ is usually interpreted as the prior probability distribution of a hypothesis $z$ before observing any data $x$.
However, if we write $p(z)$ as the marginal
$$p(z) = \int p(z, x) dx = \int p(z|x)p(x) dx= \mathbb{E}_{x\sim p(x)}p(z|x),$$
then the term $p(z)$ seems to contain the knowledge about all data $x$.
Therefore, is the prior really representing the hypothesis with no data, or with all data?
We are not any smarter with all data than we are with no data?
Or is it a question of perspective?
How should I understand the prior correctly?
Thank you!
It should be interpreted as representing the uncertainty in the hypothesis with no data.
The marginalization computation that you've written should not be viewed as "using information from more data," but rather as averaging over all possible outcomes of what the data $x$ could be. This is less informative than if you have a particular instance of the data $x$, which gives you extra information about $z$.