I am trying to understand one step in the Newton-Raphson method, used in the paper on LDA by Blei, Ng and Jordan. Namely, how does taking the derivative of $L$, w.r.t. $\alpha_j$ result in this equation. Also, what exactly is $\delta(i,j)$?
The terms which contain $\alpha$ are:
$$ L_{[\alpha]} = \sum_{d=1}^{M} \bigg( \log\Gamma(\sum_{j=1}^k \alpha_j ) - \sum_{i=1}^{k} \log \Gamma(\alpha_i) + \sum_{i=1}^{k}((\alpha_i-1)( \Psi(\gamma_{d_i}) - \Psi\sum_{j=1}^{k} \gamma_{d_j}))) \bigg)$$ Taking the derivative with respect to $\alpha_i$ gives: $$\frac{\partial L}{\partial \alpha_i} = M ( \Psi( \sum_{j=1}^{k} \alpha_j) - \Psi (\alpha_i)) + \sum_{d=1}^{M}( \Psi(\gamma_{d_i}) - \Psi(\sum_{j=1}^{k} \gamma_{d_j}))$$ The derivative depends on $\alpha_j$ where $j \neq i$, and we therefore must use and iterative method to find the maximal $\alpha$. In particular, the Hessian is in the form found in Eq. (10): $$ \frac{\partial L}{\partial \alpha_i \alpha_j} = \delta(ij)M \Psi'(\alpha_i) - \Psi'(\sum_{j=1}^{k} \alpha_j)$$
It seems like there is some typo in the paper. The actual Hessian should be of the form:
Note the difference.