My problem is described in 13th slide of the following: http://helper.ipam.ucla.edu/publications/gss2012/gss2012_10596.pdf
The only difference in my problem is that I have a single latent/hidden layer and a single observed layer.
It is easy to derive the rules given in that slide. Yet, I cannot understand how to incorporate $\ s_j$ (value of the hidden variable) to the process. Isn't that the whole point? The variable is hidden. Yet I am supposed to derive a gradient based update rule. I tried bayes' rule to reverse the roles, i.e. to get $\ P( X_{parent} | X )$ but then I get into a whole different problem where I don't know how to define the $\ P( X_{parents} | X )$, since my probability distribution is defined for $\ P( X | X_{parents} ) = \sigma(\sum_{j \in parent_x}{w_{ij}*X_j}) $ .
Is the only possibility here to do EM?