I am reading ISLP book chapter 2, The Bayesian classifier. The definition for Bayes classifier is that it assigns each observation to the most likely class, given its predictor values.
To calculate error rate we average over all choices of Dataset to train Bayes classifier with. $$ I(y_0 \neq \hat{y_0}) $$ So we go through all possible training data sets, calculate $(Pr(\it{Y}= j\ |\ \it{X} = x_0)$ to decided how to classify $x_0$.
Now, my question is that, how do we calculate $(Pr(\it{Y}= j\ |\ \it{X} = x_0)$ for a given Dataset? So to make this more clear. For example lets say $D_1 = [(x_0, y_0)] $ and $D_2 = [(x_0, y_1)]$ where $y_1 \neq y_2$ and $D_3 = [(x_1, y)]$ where $x_1 \neq x_0$.
Then what would be conditional liklihood $(Pr(\it{Y}= j\ |\ \it{X} = x_0)$ if data set is respectively $D_1$, $D_2$ and $D_3$?