After the data $y$ have been observed, we can predict an unknown observable $y_{new}$ from the same process: $$\begin{aligned} f(y_{new}|y) & = \int f(y_{new},\theta|y)d\theta\\ & =\int f(y_{new}|\theta,y)f(\theta|y)d\theta\\ & =\int f(y_{new}|\theta)f(\theta|y)d\theta \end{aligned}$$ My questions are:
Does the last equation hold because if we know the parameter $\theta$, then we know the distribution of $y$ and so is $y_{new}$, which means for the distribution of $y_{new}$, we do not need the data $y$?
Can we say that $f(y_{new}|y)$ is an average of conditional prediction over the posterior distribution of $\theta$?
Thanks~
Not necessarily. You should be explicit about the assumptions that you're making. For example, if the process is a discrete Markov chain with your transition probability matrix inside $ \theta $, clearly the distribution of the next state depends on the previous state. Here it seems like you're implicitly assuming that the distribution of $ y_{new} $ is conditionally independent of previous values given $ \theta $.
In a Bayesian statistics context, $ f(y_{new} \mid y) $ is sometimes known as the posterior predictive distribution, and is defined in pretty much the way you have described.