Restricted Boltzmann Machine Gradient

34 Views Asked by At

I'm trying to follow the gradient computation of Restricted Boltzmann Machines, where

$$\frac{\partial}{\partial\theta}-log\sum_h \frac{exp(-E(x, h))}{Z}\\=-\frac{Z}{\sum_h exp(-E(x, h))} \bigg( \sum_h \frac{1}{Z}\frac{\partial exp(-E(x, h))}{\partial \theta}-\sum_h \frac{exp(-E(x, h))}{Z^2} \frac{\partial Z}{\partial \theta} \bigg)$$

I don't quite get why there is the calculated derivative and the derivative in the last term, i.e. $\frac{1}{Z^2}$ (which also comes with replacing the minus with a plus, of the term) as well as $\frac{\partial Z}{\partial \theta}$. Is it just a rewrite of $\frac{\partial}{\partial \theta}Z^{-1}$?

1

There are 1 best solutions below

1
On BEST ANSWER

\begin{align}&\frac{\partial}{\partial\theta}\left(-log\sum_h \frac{exp(-E(x, h))}{Z}\right)\\&=-\frac{Z}{\sum_h\exp(-E(x,h) }\bigg( \sum_h \frac{1}{Z}\frac{\partial exp(-E(x, h))}{\partial \theta}+\sum_h \exp(-E(x, h)) \frac{\partial }{\partial \theta}\frac{1}{Z} \bigg)\\&=-\frac{Z}{\sum_h exp(-E(x, h))} \bigg( \sum_h \frac{1}{Z}\frac{\partial exp(-E(x, h))}{\partial \theta}-\sum_h \frac{exp(-E(x, h))}{Z^2} \frac{\partial Z}{\partial \theta} \bigg)\end{align}

It is just chain rule and product rule since

$$\frac{\partial}{\partial \theta} \frac1{Z}=-\frac{1}{Z^2}\frac{\partial Z}{\partial \theta}$$