I want to find an approximation of a mixture of probability distributions that minimises the Kullback-Leibler divergence (KLD). I need to verify my result, as it seems suspect.
We have a joint distribution \begin{align} p(\mathbf{x}, \mathbf{z}) = \prod_{k=1}^{K} \big( \pi_{k} \, p_{k} (\mathbf{x}) \big)^{z_{k}}, \end{align} where $p_{k}(\mathbf{x})$ is a probability density function, and $\pi_{k}$ is a weight that must satisfy $0 < \pi_{k} \leq 1$ together with $\sum_{k=1}^{K} \pi_{k} = 1$. Here $\mathbf{z} = \begin{bmatrix} z_{1} & z_{2} & \cdots & z_{K} \end{bmatrix}^{T}$ is a $1$-of-$K$ binary variable, where $z_{k} \in \{ 0, 1 \}$ such that $\sum_{k=1}^{K} z_{k} = 1$. That is to say, one element of $\mathbf{z}$ must equal one, while the rest must equal zero. The marginal of $\mathbf{x}$ is the $K$-component mixture \begin{align} p(\mathbf{x}) = \sum_{\mathbf{z}} p(\mathbf{x}, \mathbf{z}) = \sum_{k=1}^{K} \pi_{k} \, p_{k}(\mathbf{x}). \end{align} We now introduce a similarly defined joint distribution \begin{align} q(\mathbf{x}, \mathbf{z}) = \prod_{k=1}^{K} \big( \pi_{k} \, q(\mathbf{x}) \big)^{z_{k}}, \end{align} where $q(\mathbf{x})$ is a probability density function. I want to find the optimal $q(\mathbf{x})$ that minimises the KLD \begin{align} D_{\text{KL}} (q (\mathbf{x}, \mathbf{z}) \, || \, p(\mathbf{x}, \mathbf{z}) ) &= \sum_{\mathbf{z}} \int q(\mathbf{x}, \mathbf{z}) \log \Big( \frac{ q(\mathbf{x}, \mathbf{z}) }{ p(\mathbf{x}, \mathbf{z}) } \Big) d \mathbf{x}. \end{align} I believe the KLD can be simplified as follows: \begin{align} D_{\text{KL}} (q (\mathbf{x}, \mathbf{z}) \, || \, p(\mathbf{x}, \mathbf{z}) ) &= \sum_{\mathbf{z}} \int \Bigg( \prod_{i=1}^{K} \big( \pi_{i} \, q(\mathbf{x}) \big)^{z_{i}} \Bigg) \Bigg( \sum_{i=1}^{K} z_{i} \log \Big( \frac{ q(\mathbf{x}) }{ p_{i} (\mathbf{x}) } \Big) \Bigg) d \mathbf{x} \\ &= \sum_{k=1}^{K} \int \pi_{k} \, q(\mathbf{x}) \log \Big( \frac{ q(\mathbf{x}) }{ p_{k} (\mathbf{x}) } \Big) d \mathbf{x} \\ &= \int q (\mathbf{x}) \, \log \Big( \frac{q(\mathbf{x})}{ f (\mathbf{x}) } \Big) d \mathbf{x} - \log(Z), \end{align} where \begin{align} f(\mathbf{x}) = \frac{1}{Z}\prod\limits_{k=1}^{K} \big( p_{k} (\mathbf{x}) \big)^{\pi_{k}} \end{align} and $Z$ is a normalising constant. The KLD is minimised when $q(\mathbf{x}) = f(\mathbf{x})$. Is my simplification of the KLD correct?