I'm reading the paper Semantic Hashing by Salakhutdinov and Hinton (https://www.cs.utoronto.ca/~rsalakhu/papers/semantic_final.pdf).
They provide the equations
$$ p(v_i=n|\mathbf{h})=Ps(n,\frac{\exp(\lambda_i+\sum_jh_jw_{ij})}{\sum_k\exp(\lambda_k+\sum_jh_jw_{kj})} \tag{1} $$
and
$$ p(h_j=1|\mathbf{v}) = \sigma(b_j+\sum_iw_{ij}v_i) \tag{2} $$
where $Ps(n,\lambda)=e^{-\lambda}\lambda^n/n!$, $\sigma(x)=1/(1+e^{-x})$ and $N=\sum_iv_i$.
Then, they state that the marginal distribution over visible count vectors v is:
$$ p(\mathbf{v})=\sum_{\mathbf{h}}\frac{\exp(-E(\mathbf{v},\mathbf{h}))}{\sum_{\mathbf{u},\mathbf{g}}\exp(-E(\mathbf{u},\mathbf{g}))} \tag{3} $$ where $$ E(\mathbf{v},\mathbf{h})=-\sum_i\lambda_iv_i+\sum_i\log(v_i!) - \sum_jb_jh_j-\sum_{i,j}v_ih_jw_{ij} \tag{4} $$
My question is. How to they get to the expression (3)? My math skills are not great and a derivation would be greatly appreciated.