Feeling overwhelmed with this math explanation, what videos, courses, and books should I learn to help me digest this?

65 Views Asked by At

I'm recently trying to learn about deep learning, and stumbled across this excellent video explanation by Hugo Larochelle

Neural networks [5.2] : Restricted Boltzmann machine - inference

But once I got to the mathematical explanation on probability of hidden units given inputs ( $p(h\mid x)$ ), the mathematics there, I'm just not sure how and why some of the transformation even worked. I paused and restarted the videos multiple times, but still there are a few questions unanswered:

  1. I understand that the first line there was from Bayes theorem, but I don't get how the sum of joint probability of inputs and all possible hidden layer vector (denoted by $p(x,h-prime)$ could replace $p(x)$ that should be the denominator there (he mentioned it briefly in 5:01).
  2. In 8:00 - "...what this means is, if I'm summing over x-capital-H all the terms here in this product are constant with respect of h-prime-capital-H, except for the last one, except for h-prime-h, so the last term, for j equals capital-H. So it means that all of the other factors in this product I can actually put them in front of the sum and just perform the sum over the last hidden unit of the corresponding factor in this whole product here. And once I compute this, this is a constant with respect to all of the hidden units so I can actually put it in front of this whole sum, and in this way I could actually write down this nested sum here as just a product of the sum over the first hidden unit times the sum over the second hidden unit and so on..." That is a load of information there for me. I don't understand what that explanation actually mean, can anybody help me rephrase that please? And same question as above, what math courses should I learn first to "get" this?
  3. Counting from the top, I don't get how he got from 5th to 6th line there. What are the rules of products and sums that allow these sums to turn into a product?
  4. 10:56 - How did that end up become a distribution?

Sorry that it's a lot of details, but I would really appreciate any help in this, I'm very stumped.