I was working through one proof of the Naive Bayes and got stuck at the last step.
The setup is as follows: Given a dataset $\left\{ (x^{(i)},y^{(i)}), \cdots\right\}$ for $i=1,\cdots,m$, $y$ can take values from $1$ to $k$ with probability $\phi_1$ to $\phi_k$. $x \in \mathbb{R}^n $.
The naive bayes assumptions apply, and hence the log likelihood is: $$ l(\Theta) = \sum_{i=1}^{m} \sum_{j=1}^{n} \log (P(x_j^{(i)}|y^{(i)}) + \sum_{i=1}^{m} \sum_{j=1}^{n} \ + 1 \left\{y^{(i)}=C_j\right\} \log \phi_j $$
To find the MLE estimate of $\phi_j$, we only take the second term and create the Lagrangian multiplier: $$ \mathbb{L} = \sum_{i=1}^{m} \sum_{j=1}^{n} \ + 1 \left\{y^{(i)}=C_j\right\} \log \phi_j -\lambda(\sum_i \phi_i -1) $$
and the first order condition is $$ \frac{\partial \mathbb{L}}{\partial \phi_j} = 0 = \frac{1 \left\{y^{(i)}=C_j\right\}}{\phi_j} + \lambda $$ and of course, $$ \sum_i \phi_i =1 $$
My question is, how do we progress from here? How do we solve for the variables?
In particular, how do I prove that $$ \lambda = \sum_{i=1}^m \sum_{j=1}^n 1 \left\{y^{(i)}=C_j\right\} $$
I have stared at this for a while and hope that somebody can shed some light on this.
Well you have already done 90% of the work.
Once you have the equation
$$ \phi_j = \sum_{i=1}^m-\frac{1 \left\{y^{(i)}=C_j\right\}}{\lambda} $$
Substitute that into the constraint $\sum_i \phi_i =1 $ . With that , you get $ \lambda = -N $. Plug that back into the above equation , to get your result.