Naive Bayes Classifier - With Lagrange Variable- Derivation

1.1k Views Asked by At

I am running through this link to understand better the derivation for MLE for Naive Bayes: https://mattshomepage.com/articles/2016/Jun/26/multinomial_nb/

In particular, i am confused as to this part:

$L=\sum_{i=1}^{N}\sum_{j=1}^Pf_{ij}\log(\theta_j)+\lambda(1−\sum_{j=1}^P\theta_j)$

When taking the derivative:

$\frac{\partial L}{\partial {\theta_k}} = \sum_{i=1}^N \frac{f_{ik}}{\theta_k} - \lambda =0$

why does the derivative of $f_{ij}$ go to $f_{ik}$ (similar happens for the $log(\theta_j)$. The differential goes to $\theta_k$)

Can someone help explain why this is the case?

Thanks

1

There are 1 best solutions below

1
On BEST ANSWER

$$\frac{\partial L}{\partial {\theta_k}} = \frac{\partial}{\partial {\theta_k}} (\sum_{i=1}^{N}\sum_{j=1}^Pf_{ij}\log(\theta_j)+\lambda(1−\sum_{j=1}^P\theta_j)) = \sum_{i=1}^{N}\sum_{j=1}^P\frac{\partial f_{ij}\log(\theta_j)}{\partial {\theta_k}}-\lambda(\sum_{j=1}^P\frac{\partial \theta_j}{\partial {\theta_k}})$$

As $f_{ij}$ does not depend on $\theta_k$ and neither does $\theta_i$ for $i \neq k$ we have

$$\frac{\partial f_{ij}\log(\theta_j)}{\partial {\theta_k}} = \begin{cases} 0 & \quad j \neq k \\ \frac{f_{ij}}{\theta_k} & \quad j = k \end{cases}$$

$$\frac{\partial \theta_j}{\partial {\theta_k}} = \begin{cases} 0 & \quad j \neq k \\ 1 & \quad j = k \end{cases}$$

If you insert this to the above mentioned formula you will indeed get

$$\frac{\partial L}{\partial {\theta_k}} = \sum_{i=1}^N \frac{f_{ik}}{\theta_k} - \lambda$$