I am exploring utilizing Expectation Maximization (EM) algorithm in machine learning where the exact distribution of the data is unknown as all the observed sample pairs do not form the complete data. We assume that each data consists of a pair $(\lambda, X)$, where $\lambda$ is a latent parameter and $X$ is the data (features) given. The machine learning model that we would like to train consists of a set of parameters $\theta$ and learns a function $\Phi$ where $\Phi(\lambda, X; \theta)$.
We note that $\lambda, \theta$ jointly determines the neural network model. The goal is then to compute $$Q(\theta|\theta^{(t)}) = \mathbb{E}_{\lambda\sim p(\cdot|\bf{X},\theta^{(t)})}p(\lambda, \textbf{X}|\theta),$$ $$Q(\theta|\theta^{(t)}) = \sum_{\lambda} {p(\lambda\mid\textbf{X},\theta^{(t)}) \log p(\lambda, \textbf{X}\mid\theta)}$$ where $\textbf{X}$ denotes all the observed pairs, so as to update and obtain $\theta^{(t+1)}$.
We know $p(\lambda, \textbf{X}\mid\theta)$ up to proportionality and thus, we have used the MCMC algorithm to draw samples from the distribution and thus, are able to obtain an estimation of $p(\lambda\mid\textbf{X},\theta^{(t)})$ a.k.a the weights. Despite that, we can't explicitly compute the $Q$-function as we do not know the normalization constant.
Currently, I am stuck on how to optimize the $Q$-function such as to obtain $\theta^{(t+1)}$, up to a sufficient condition. Have anyone run into this problem? An alternative solution instead of using MCMC is also welcomed!