Likelihood calculation for Naive Bayes classifier

42 Views Asked by At

I am reading the Generative models for discrete data chapter in Kevin P Murphy's book(Machine Learning: A Probabilistic Perspective) Here for calculating the MLE of naive Bayes (pg no: 83) the equation used is,

$\mathcal{p}$($X_{i}$,$y_{i}$|$\theta$) = $\mathcal{p}$($y_{i}$|$\pi$) $\prod_{j}$$\mathcal{p}$($x_{ij}$|$\theta_{j}$) where $X_{i}$,$y_{i}$ is one data sample and $\theta_{j}$ is the unknown parameter associated with $j_{th}$ feature

From the usual derivation, I got below $\mathcal{p}$($X_{i}$,$y_{i}$|$\theta$) = $\mathcal{p}$($y_{i}$|$X_{i}$,$\theta$) * $\prod_{j}$$\mathcal{p}$($x_{ij}$|$\theta_{j}$)

How can $\mathcal{p}$($y_{i}$|$X_{i}$,$\theta$) turn into $\mathcal{p}$($y_{i}$|$\pi$), What kind of plugin approximation would lead to this term?

I am unable to get the $\mathcal{p}$($y_{i}$|$\pi$) in the equation, can someone explain how that term got there?