Here I have the following Bayesian Graph:
In other words, $\alpha$ and $\theta$ are parameters, while $\pi, \mathbf{z}, \mathbf{x}$ are random variables. From the graph, I know that:
$f(\mathbf{x}, \mathbf{z}, \pi | \alpha, \theta) = f(\mathbf{x} | \mathbf{z}, \theta) f(\mathbf{z} | \pi) f(\pi | \alpha) $
where: $$\mathbf{x} = [\mathbf{x}_{1}, \ldots, \mathbf{x}_{N}]^{T}, \mathbf{x}_{n} \in \mathbb{R}^{p}$$ $$\mathbf{z} = [\mathbf{z}_{1}, \ldots, \mathbf{z}_{N}]^{T}, \mathbf{z}_{n} \in \{0,1\}^{G} \sum_{g=1}^{G} z_{n,g} = 1$$ $$\pi = [\pi_{1}, \ldots, \pi_{G}]^{T}, \pi_{n} \in \mathbb{R}_{+}, \sum_{g=1}^{G} \pi_{g} = 1$$
So the corresponding distributions are:
$$ f(\mathbf{x} | \mathbf{z}, \theta) = \prod_{n=1}^{N} \prod_{g=1}^{G} \left[ \mathcal{N}( \mathbf{x}_{n} | \theta) \right]^{z_{n,g}} $$ $$ f(\mathbf{z} | \mathbf{\pi}) = \prod_{n=1}^{N} \prod_{g=1}^{G} [ \pi_{g} ]^{z_{n,g}}$$ $$ f(\pi | \alpha) = Dir(\frac{\alpha}{G}) = \frac{\Gamma(\alpha)}{[\Gamma(\frac{\alpha}{G})]^{G}} \prod_{g=1}^{G} \pi_{g}^{\frac{\alpha}{G}-1} $$
Putting this all together, gives us:
$$ f(\mathbf{x}, \mathbf{z}, {\pi} | {\theta}) = \prod_{n=1}^{N} \prod_{g=1}^{G} \left[ \pi_{g} f_{g}\left( \mathbf{x}_{n} | {\theta} \right) \right]^{z_{n,g}} \cdot \frac{ \Gamma\left( \alpha \right) }{ \left[ \Gamma\left( \frac{\alpha}{G} \right) \right]^{G} } \prod_{g=1}^{G} \pi_{g}^{\frac{\alpha}{G}-1} $$
where $ f_{g}\left( \mathbf{x}_{n} | {\theta} \right) = \mathcal{N}\left( \mathbf{x}_{n} | {\theta} \right) \; \forall g$. So taking the log:
$$ \text{log}\left( f(\mathbf{x}, \mathbf{z}, {\pi} | {\theta}) \right) = \text{log}\left( f \left( \mathbf{x}, \mathbf{z} | {\pi}, {\theta} \right) \right) + \text{log} \left( f\left( {\pi} | {\alpha} \right) \right) $$
where:
$$ \text{log}\left( f \left( \mathbf{x}, \mathbf{z} | {\pi}, {\theta} \right) \right) = \sum_{n=1}^{N} \sum_{g=1}^{G} z_{n,g} \left[ \text{log}\left( f_{g}\left( \mathbf{x}_{n} | {\theta} \right) \right) + \text{log}\left( \pi_{g} \right) \right] $$
$$ \text{log} \left( f\left( {\pi} | {\alpha} \right) \right) = \text{log}\left( \Gamma(\alpha) \right) - G \text{log} \left( \Gamma\left( \frac{\alpha}{G} \right) \right) + \left( \frac{\alpha}{G} - 1 \right) \sum_{g=1}^{G} \text{log} \left( \pi_{g} \right) $$
This is where I start being unsure:
So if I want to find my parameters, $\alpha, \theta$, then I need to solve:
$$ \underset{\mathbf{\theta}, {\alpha}}{\mathrm{maximize}} \; E_{\mathbf{z}, \mathbf{\pi}} \left[ \text{log}\left( f(\mathbf{x}, \mathbf{z}, \mathbf{\pi} | \mathbf{\theta}, \alpha) \right) \right] $$
since $\mathbf{z}, \pi$ are random variables. Plugging in:
$$ \underset{\mathbf{\theta}, {\alpha}}{\mathrm{maximize}} \; \sum_{n=1}^{N} \sum_{g=1}^{G} \hat{z}_{n,g} \left[ \text{log}\left( f_{g}\left( \mathbf{x}_{n} | \mathbf{\theta} \right) \right) + \text{log}\left( \hat{\pi}_{g} \right) \right] + \text{log}\left( \Gamma(\alpha) \right) - G \text{log} \left( \Gamma\left( \frac{\alpha}{G} \right) \right) + \left( \frac{\alpha}{G} - 1 \right) \sum_{g=1}^{G} \text{log} \left( \hat{\pi}_{g} \right) $$
where:
$$\hat{z}_{n,g} = E_{\mathbf{z}}\left[ f\left( z_{n,g} | \mathbf{x}_{n}, {\pi}, {\theta} \right) \right] = \frac{ \pi_{g} f_{g}( \mathbf{x}_{n} | \theta_{g}) }{ \sum_{k=1}^{G} \pi_{k} f_{k}( \mathbf{x}_{n} | \theta_{k}) }$$
But how to find $log(\hat{\pi}_{g})$? I think I need to find the expectation of:
$$ f\left( {\pi} | \mathbf{z}, {\alpha} \right) = f\left( {\pi} | \mathbf{m}, {\alpha} \right) = \text{Dir} \left( {\pi} | {\frac{\alpha}{G}} + \mathbf{m} \right) = \frac{ \Gamma( \alpha + N)}{ \prod_{g=1}^{G} \Gamma( \frac{\alpha}{G} + m_{g})} \prod_{g=1}^{G} \pi_{g}^{\frac{\alpha_{g}}{G} + m_{g} - 1} $$
where $m_{g} = \sum_{n=1}^{N} z_{n,g}$, $N = \sum_{g=1}^{G} m_{g}$. If I do this, then:
$$ \hat{\pi}_{g} = E_{\pi_{g}}\left[ \pi_{g} | \mathbf{m}, {\alpha} \right] = \frac{ \frac{\alpha}{G} + m_{g} }{ \alpha + N } $$
$$ \text{log} \left( \hat{\pi}_{g} \right) = E_{\pi_{g}}\left[ \text{log}\left(\pi_{g}\right) | \mathbf{m}, {\alpha} \right] = \psi\left( \frac{\alpha}{G} + m_{g} \right) - \psi \left( \alpha + N \right) $$
where $\psi(\cdot)$ is the Digamma function.
So then, finding $\theta$ is easy enough, take the derivative of objective function after plugging in the expected values. However, how to maximize with respect to $\alpha$?
I think:
$$ \underset{{\alpha}}{\mathrm{maximize}} \; \sum_{n=1}^{N} \sum_{g=1}^{G} \hat{z}_{n,g} \left[ \text{log}\left( f_{g}\left( \mathbf{x}_{n} | {\theta} \right) \right) + \text{log}\left( \hat{\pi}_{g} \right) \right] + \text{log}\left( \Gamma(\alpha) \right) - G \text{log} \left( \Gamma\left( \frac{\alpha}{G} \right) \right) + \left( \frac{\alpha}{G} - 1 \right) \sum_{g=1}^{G} \text{log} \left( \hat{\pi}_{g} \right) $$ $$ \mathrm{subject \; to} \; {\alpha} > {0} \\ $$
where I put the restriction of $\alpha$ on purpose to make sure it is always greater than zero. Using a logarithmic barrier instead, this becomes:
$$ \underset{{\alpha}}{\mathrm{maximize}} \; \sum_{n=1}^{N} \sum_{g=1}^{G} \hat{z}_{n,g} \left[ \text{log}\left( f_{g}\left( \mathbf{x}_{n} | {\theta} \right) \right) + \text{log}\left( \hat{\pi}_{g} \right) \right] + \text{log}\left( \Gamma(\alpha) \right) - G \text{log} \left( \Gamma\left( \frac{\alpha}{G} \right) \right) + \left( \frac{\alpha}{G} - 1 \right) \sum_{g=1}^{G} \text{log} \left( \hat{\pi}_{g} \right) + \text{log}\left( {\alpha} \right) $$
I don't think there is a closed form solution to this. So, taking the derivative:
$$ \frac{\partial}{\partial \alpha} f(\alpha) = \sum_{n=1}^{N} \sum_{g=1}^{G} \left[ \hat{z}_{n,g} + \frac{\alpha}{G} -1 \right] \frac{\partial}{\partial \alpha} \text{log}\left( \hat{\pi}_{g} \right) + \psi(\alpha) - \psi\left(\frac{\alpha}{G}\right) + \frac{1}{\alpha} $$
where $\psi(\alpha) = \frac{\frac{d}{d \alpha}\Gamma(\alpha)}{\Gamma(\alpha)}$ is the Digamma function.
Well:
$$ \frac{\partial}{\partial \alpha} log(\hat{\pi}_{g}) = \frac{1}{G} \psi^{(1)}\left( \frac{\alpha}{G} + m_{g} \right) - \psi^{(1)} \left( \alpha + N \right) $$
where $\psi^{(1)}(\alpha) = \frac{\partial}{\partial \alpha} \psi(\alpha)$ is the Trigamma function. So:
$$ \frac{\partial}{\partial \alpha} f(\alpha) = \sum_{n=1}^{N} \sum_{g=1}^{G} \left[ \hat{z}_{n,g} + \frac{\alpha}{G} -1 \right] \left[ \frac{1}{G} \psi^{(1)}\left( \frac{\alpha}{G} + m_{g} \right) - \psi^{(1)} \left( \alpha + N \right) \right] + \psi(\alpha) - \psi\left(\frac{\alpha}{G}\right) + \frac{1}{\alpha} $$
So then I would have to do some sort of gradient ascent??
Questions:
Have I computed the $E_{\pi_{g}}\left[ \text{log}\left(\pi_{g}\right) | \mathbf{m}, {\alpha} \right]$ term correctly? Is that what I am supposed to do at that stage?
Have I computed the maximization with respect to $\alpha$ correctly? Should I be solving this numerically or is there actually a closed form?
Basically, since I am new to this whole Bayesian graph parameter estimation thing, I want to know if I am on the right track, and regardless if I am or not, how in the world can I estimate $\alpha$ given all of this?
Please let me know if you need more info or have questions!
