I am running through this link to understand better the derivation for MLE for Naive Bayes: https://mattshomepage.com/articles/2016/Jun/26/multinomial_nb/
In particular, i am confused as to this part:
$L=\sum_{i=1}^{N}\sum_{j=1}^Pf_{ij}\log(\theta_j)+\lambda(1−\sum_{j=1}^P\theta_j)$
When taking the derivative:
$\frac{\partial L}{\partial {\theta_k}} = \sum_{i=1}^N \frac{f_{ik}}{\theta_k} - \lambda =0$
why does the derivative of $f_{ij}$ go to $f_{ik}$ (similar happens for the $log(\theta_j)$. The differential goes to $\theta_k$)
Can someone help explain why this is the case?
Thanks
$$\frac{\partial L}{\partial {\theta_k}} = \frac{\partial}{\partial {\theta_k}} (\sum_{i=1}^{N}\sum_{j=1}^Pf_{ij}\log(\theta_j)+\lambda(1−\sum_{j=1}^P\theta_j)) = \sum_{i=1}^{N}\sum_{j=1}^P\frac{\partial f_{ij}\log(\theta_j)}{\partial {\theta_k}}-\lambda(\sum_{j=1}^P\frac{\partial \theta_j}{\partial {\theta_k}})$$
As $f_{ij}$ does not depend on $\theta_k$ and neither does $\theta_i$ for $i \neq k$ we have
$$\frac{\partial f_{ij}\log(\theta_j)}{\partial {\theta_k}} = \begin{cases} 0 & \quad j \neq k \\ \frac{f_{ij}}{\theta_k} & \quad j = k \end{cases}$$
$$\frac{\partial \theta_j}{\partial {\theta_k}} = \begin{cases} 0 & \quad j \neq k \\ 1 & \quad j = k \end{cases}$$
If you insert this to the above mentioned formula you will indeed get
$$\frac{\partial L}{\partial {\theta_k}} = \sum_{i=1}^N \frac{f_{ik}}{\theta_k} - \lambda$$