Derivative of $C = D_KL(p || q)$ w.r.t $q_{ij}$ where $q_{ij} = \frac{\exp(z_{ij})}{\sum_{k=1}^N\sum_{l=1}^N \exp(z_{kl})}$

27 Views Asked by At

For an exercise, I need to compute $$\frac{\partial C}{\partial q_{ij}}$$ where $$C = D_{KL}(p || q) = \sum_{i=1}^N \sum_{j=1}^N p_{ij} \log \left (\frac{p_{ij}}{q_{ij}} \right)$$ and $q_{ij} = \frac{\exp(z_{ij})}{\sum_{k=1}^N\sum_{l=1}^N \exp(z_{kl})}$, $\sum_{i=1}^N \sum_{j=1}^N p_{ij} = 1$ and $z_{ij}$ are unnormalized log probabilities. As far as I know, the base for the Kullback–Leibler divergence does not really matter, so we can assume base $e$. The exercise states that:

$$\frac{\partial C}{\partial q_{ij}} = -p_{ij} + q_{ij}$$ but I end with something different. Here is my approach: $$ C = \sum_{i=1}^{N} \sum_{j=1}^{N} p_{ij} \cdot \left ( \log(p_{ij}) - \log(q_{ij}) \right) $$ $$ C = \sum_{i=1}^{N} \sum_{j=1}^{N} p_{ij} \cdot \left ( \log(p_{ij}) - (\log(\exp(z_{ij})) - \log(\sum_{k=1}^{N} \sum_{l=1}^N exp(z_{kl})) ) \right )$$ $$ C = \sum_{i=1}^{N} \sum_{j=1}^{N} p_{ij} \cdot \log(p_{ij}) - p_{ij} \cdot \left (z_{ij} - \log(\sum_{k=1}^{N} \sum_{l=1}^N exp(z_{kl})) ) \right )$$ $$ \frac{\partial C}{\partial z_{ij}}= p_{ij} - \frac{p_{ij} \log(\sum_{k=1}^{N} \sum_{l=1}^N exp(z_{kl}))}{\partial z_{ij}} $$ $$ \frac{\partial C}{\partial z_{ij}}= p_{ij} - p_{ij} \frac{1}{\sum_{k=1}^{N} \sum_{l=1}^N exp(z_{kl}))} \cdot \exp(z_{ij}) $$ $$ = p_{ij} - p_{ij} \cdot q_{ij}$$

What am I missing here?