Is my derivation for the maximum likelihood estimation for naive bayes correct?

293 Views Asked by At

I think I have I gotten the wrong formula for the following derivation, but I don't know where. here is my explanation:

For a task on sentiment analysis, suppose we have some classes represented by $c$ and features $i$.

We can represent the conditional probability of each class as: $$P(c | w_i) = \frac{P(w_i|c) \cdot P(c)}{P(w_i)}$$ where $w_i$ represents each feature and $c$ is the class we have. Then empirically, we can represent $$P(w_i|c) = \frac{n_{ci}}{n_c}$$ $$P(w_i) = \frac{n_{i}}{n}$$ Our priors for each classes are then given by: $$P(c) = \frac{n_c}{n}$$ where:

$n$ is the total number of features in all classes.

$n_{ci}$ represents the number of counts of that feature $i$ in class $c$.

$n_c$ is the total number of features for the class, and

$n_i$ is the total number of features for all classes.

To actually compute the conditional probability numerically:

$$P(c | w_i) = \frac{P(w_i|c) \cdot P(c)}{P(w_i)} = \frac{n_{ci}}{n_c} \cdot \frac{n_c}{n}\cdot \frac{n}{n_i} = \frac{n_{ci}}{n_i}$$ But I've did a check and found the formula should be $$\frac{n_{ci}}{n_c}$$ instead. What am I missing? Also, can I so cursorily estimate $P(w_i|c)$ as $\frac{n_{ci}}{c}$?

Furthermore, if the derivation could be this direct, why can't programs directly simulate this specification? Due to this disparity I don't think my proof is correct.