Within Murphy's book Machine Learning: A probabilistic perspective, on page 77 he writes the following:
$p(\theta|D', D'') \propto p(D''|\theta)p(\theta|D')$
Where $D' and D''$ are data sets and $\theta$ the parameter. I have an understanding of the chain rule of probabilities but am stumped here as to how this is formulated
I wonder how this result was achieved? If someone could explain I would be very grateful
Book link: https://www.cs.ubc.ca/~murphyk/MLbook/
The chain rule for probabilities states that $$p(\theta , D' , D'') = p(D''|\theta, D')p(\theta \: |D')p(D')$$ Now usually in a Bayesian setup we assume that data is conditionally independent on the parameter $\theta$, which means that $p(D',D''|\theta) = p(D'| \theta)p(D'' | \theta)$, and it implies that $p(D'' | \theta , D') = p(D'' | \theta)$.
Now we have that \begin{align*} p(\theta | D',D'') &\propto p(\theta \: , D', D'') \\ &=p(D'' | \theta)p(\theta | D')p(D') \\ &\propto p(D''|\theta)p(\theta|D') \end{align*}