Product Rule and Differentiation with logarithms

60 Views Asked by At

I'm currently trying to understand the math in this paper Xu et al. (Neural Image Caption Generation with Visual Attention).

From the paper:

There is an objective function $L_s$ which is a variational lower bound on the marginal log-likelihood $p(y|a)$

\begin{align*} L_s = \sum_s p(s|a) log[ p(y|s,a)] \leq log [\sum_s p(s|a)p(y|s,a)] = log [p(y|a)] \end{align*}

The objective is to derive parameters W by optimising $L_s$

\begin{align*} \frac{\partial L_s}{\partial W} = \sum_s p(s|a)\left[ \frac{\partial log[p(y|s,a)]}{\partial W} + log[p(y|s,a)] \frac{\partial log[p(s|a)]}{\partial W} \right] \end{align*}

I found an answer (2) which explains how to form the variational lower bound on the marginal log-likelihood p(y|a):

From (2): \begin{align*} \frac{\partial}{\partial W}L_s &= \sum_s \frac{\partial p(s|a)}{\partial W}\log p(y | s , a) + p(s|a) \frac{\partial p(y | s ,a)}{\partial W} \\ &= \sum_s \left( p(s | a)\frac{\partial \log p(s|a)}{\partial W} \right)\log p(y|s , a) + p(s |a)\frac{\partial \log p(y |s,a)}{\partial W} \\ &= \sum_s p(s|a) \left[ \frac{\partial \log p(s|a)}{\partial W}\log p(y| s,a) + \frac{\partial \log p(y| s, a )}{\partial W} \right] \end{align*}

I don't understand how the derivative is formed. From my understanding, finding the derivative of $Ls$ using the product rule:

\begin{align*} f &= p(s|a) \\ f' &= \frac{\partial p(s|a)}{\partial W} \\ \\ g &= \log p(y| s, a )\\ \end{align*}

Why is the derivative of g this:

\begin{align*} g' &= \frac{\partial p(y| s, a )}{\partial W} \end{align*}

and not this:

\begin{align*} g' &= \frac{1}{p(y| s, a )} \frac{\partial p(y| s, a )}{\partial W} \end{align*}

Secondly, the transition from the first line of the derivative to the second. When the first term gets multiplied by the log it ends up with $p(s|a)$ out front: \begin{align*} p(s|a) \frac{\partial \log p(s|a)}{\partial W} \end{align*}

Whereas the second term only gets a log introduced: \begin{align*} \frac{\partial \log p(y |s,a)}{\partial W} \end{align*}

Why is this?

Thanks!