I would like to differentiate this equation where $W_{ik}$ and $H_{kj}$ are matrix:
$0 = \sum_{ij}(log(\frac{\sum_{k}W_{ik}H_{kj}}{const.}))$ wrt $H_{kj}$
After applying chain rule to log:
$0 = \sum_{ij}(\frac{const.}{\sum_{k}W_{ik}H_{kj}})(\frac{\sum_{k}W_{ik}}{const.})$
I wonder for the term $\frac{\sum_{k}W_{ik}}{const.}$, the $\sum_{k}$ should be remained?
What I am confused is: since $H_{kj}$ is already differentiated, $W_{ik}$ can't do matrix multiplication and sum may be not required anymore. But in another way, the final answer is zero. If $W_{ik}$ doesn't take sum, the final answer will be vector (not zero). I am not sure what is right. May someone explain the reason about this?
Omit the summation symbols and use straight matrix notation.
Let $\,\beta = \tfrac{1}{{\rm const.}},\,$ then the differential and gradient of the function can be calculated as $$\eqalign{ \phi &= 1:\log(\beta WH) \cr d\phi &= 1:d\log(\beta WH) = 1:\frac{\beta W\,dH}{\beta WH} = W^T\Big(\frac{1}{WH}\Big):dH \cr \frac{\partial\phi}{\partial H} &= W^T\Big(\frac{1}{WH}\Big) \cr }$$ where the dimensions of the variables are: $W\in{\mathbb R}^{m\times n},\,$ $H\in{\mathbb R}^{n\times p},\,$ $1\in{\mathbb R}^{m\times p},\,$ and $\phi\in{\mathbb R}$.
Further, $\log(X)$ and $\frac{1}{X}$ are taken to be element-wise operations, and a colon has been used to denote the trace/Frobenius product, i.e. $$\eqalign{A:B = {\rm tr}(A^TB)}$$