I'm reading about Naive Bayes classification concept, noting that we make the conditionally independence assumption. But isn't this the general assumption that is always made dealing with machine learning algorithms?
Suppose we have a supervised binary classification problem setup, with a dataset $\mathcal{D} = \{(x_1,t_1), \dots, (x_n,t_n)\}$ where $x_i \in \mathbb{R}^D$ and $t_i \in\{0,1\} \, \, \forall i=1, \dots n$.
I've read everywhere that we always make the assumption that data are iid (independent and with the same probability dstribution, this would mean that $p((x_i, t_i),(x_j,t_j)) = p((x_i,t_i))p((x_j,t_j))$..right?). At this point it is reasonable to think of a Bernoulli distribution to model the data. Let $p(\mathcal{D}|\theta)$ the likelihood function: then we want to find
$$\hat{\theta} = arg\max_{\theta}p(\mathcal{D}|\theta)$$
where $p(\mathcal{D}|\theta) = p((x_1,t_1), \dots ,(x_n,t_n)|\theta) = p((x_1,t_1)|\theta)\, , \dots , \, p((x_n,t_n)|\theta)$ Here we should use a conditional independence hypothesis in order to go on. So in every situation we use the naive bayes hypothesis? I'm having troubles trying to distinguish..
In Naive Bayes, we assume that all features in $x$ are mutually independent, conditional on the category $C_k$.
The features can be height and weight and the category, $C$ can be whether your BMI is higher than certain quantity.
Here given $C$, we assume that height and weight are independent. The conditional independence is described for the features.