Naive Bayes classifier big O complexity

2.1k Views Asked by At

I am trying to learn about The Naive Bayes Classifier as defined by the following:

naive_bayes_definition

where $\textbf{x} \in \{1, ... , K\}^{D}$

$K$ is the number of different values a feature can have

$D$ is the number of features in an input vector $\textbf{x}$

I actually have two questions:

First, could someone please explain this formula? It is still very fuzzy. For instance, the book says: "The model is called “naive” since we do not expect the features to be independent, or even conditional on the class label". Why is that the case? I understand what the class label is in this case (y=c) but I don't see why the book would make such an assertion.

And second, the book also says that the model has $O(C \times D)$ parameters. I know what Big O notation is, but what does it mean in this case?

Thanks in advance