Bayes' Theorem states that:
$ P\left(y \mid x_1, \cdots, x_n\right)=\frac{P\left(x_1, \cdots, x_n \mid y\right) \cdot P(y)}{P\left(x_1, \cdots, x_n\right)} $
In Naive Bayes Classifier we can say the following: $ P\left(x_i \mid y, x_1, \cdots, x_{i-1}, x_{i+1}, \cdots, x_n\right)=P\left(x_i \mid y\right) $
The previous formula allows us to write the following: $ P\left(y \mid x_1, \cdots, x_n\right)=\frac{\prod_{i=1}^n P\left(x_i \mid y\right) \cdot P(y)}{P\left(x_1, \cdots, x_n\right)} $
I however do not understand the last step. How did we go from step 2 to step 3? Could someone help me understand this?
Note that $P(x_1, x_2 \mid y) = P(x_1 \mid y, x_2) P(x_2 \mid y)$. Generalizing to $n$ terms, we have \begin{align} &P(x_1, \ldots, x_n \mid y) \\ &= P(x_1 \mid y, x_2, \ldots, x_n) P(x_2 \mid y, x_3, \ldots, x_n) \cdots P(x_{n-1} \mid y, x_n) P(x_n \mid y) \\ &= P(x_1 \mid y) P(x_2 \mid y) \cdots P(x_{n-1} \mid y) P(x_n \mid y). \end{align}
The second condition uses the Naive Bayes assumption: $x_1, \ldots, x_n$ are conditionally independent given $y$.