I'm reading this paper about Restricted Boltzmann Machines. However there are two steps that I don't understand when they compute the gradient of the log-likelihood (section 4.1). Here is a screenshot for the formulas that are troubling me:

What I don't understand in particular:
1- How the multiplication became a summation in (problem 1)?
2- Why the summation is equal to 1 in (problem 2)?!