ANSML - Proof of Naive Bayes Derivation

462 Views Asked by At

I was working through one proof of the Naive Bayes and got stuck at the last step.

The setup is as follows: Given a dataset $\left\{ (x^{(i)},y^{(i)}), \cdots\right\}$ for $i=1,\cdots,m$, $y$ can take values from $1$ to $k$ with probability $\phi_1$ to $\phi_k$. $x \in \mathbb{R}^n $.

The naive bayes assumptions apply, and hence the log likelihood is: $$ l(\Theta) = \sum_{i=1}^{m} \sum_{j=1}^{n} \log (P(x_j^{(i)}|y^{(i)}) + \sum_{i=1}^{m} \sum_{j=1}^{n} \ + 1 \left\{y^{(i)}=C_j\right\} \log \phi_j $$

To find the MLE estimate of $\phi_j$, we only take the second term and create the Lagrangian multiplier: $$ \mathbb{L} = \sum_{i=1}^{m} \sum_{j=1}^{n} \ + 1 \left\{y^{(i)}=C_j\right\} \log \phi_j -\lambda(\sum_i \phi_i -1) $$

and the first order condition is $$ \frac{\partial \mathbb{L}}{\partial \phi_j} = 0 = \frac{1 \left\{y^{(i)}=C_j\right\}}{\phi_j} + \lambda $$ and of course, $$ \sum_i \phi_i =1 $$

My question is, how do we progress from here? How do we solve for the variables?

In particular, how do I prove that $$ \lambda = \sum_{i=1}^m \sum_{j=1}^n 1 \left\{y^{(i)}=C_j\right\} $$

I have stared at this for a while and hope that somebody can shed some light on this.

1

There are 1 best solutions below

0
On

Well you have already done 90% of the work.

Once you have the equation

$$ \phi_j = \sum_{i=1}^m-\frac{1 \left\{y^{(i)}=C_j\right\}}{\lambda} $$

Substitute that into the constraint $\sum_i \phi_i =1 $ . With that , you get $ \lambda = -N $. Plug that back into the above equation , to get your result.