Expectation Maximization derivation using Jensen's inequality

376 Views Asked by At

I'm reading Andrew NG paper on Expectation Maximization algorithm in HMM Paper and I'm struggling with one 'simple' derivation:

derivation

There is using Jensen Inequality, but I can't link (2) and (3).

Could anybody insert another equation in the middle to simplify the derivation?

Thank you in advance

2

There are 2 best solutions below

2
On BEST ANSWER

Fix $i$, and let $x_i \stackrel{\rm def}{=} \frac{p(x^{(i)}, z^{(i)}; \theta)}{Q_i(z^{(i)})}$ for all $z^{(i)}$ for convenience. Recalling that $\sum_{z^{(i)}} Q_i(z^{(i)}) = 1$, we can rewrite $$ \log \sum_{z^{(i)}}Q_i(z^{(i)}) \frac{p(x^{(i)}, z^{(i)}; \theta)}{Q_i(z^{(i)})} = \log \sum_{z^{(i)}}Q_i(z^{(i)})\cdot x_i = \log \mathbb{E}[x_i] $$ where the expectation is over the choice of $z^{(i)}\sim Q_i$ (as per the above remark, $Q_i$ defines a probability distribution).

Since $\log$ is concave, Jensen's inequality then guarantees that $$ \log \mathbb{E}[x_i] \geq \mathbb{E}[ \log x_i] $$ and, rewriting the definitions of the LHS and RHS,, we get $$ \log \sum_{z^{(i)}}Q_i(z^{(i)}) \frac{p(x^{(i)}, z^{(i)}; \theta)}{Q_i(z^{(i)})} \geq \sum_{z^{(i)}}Q_i(z^{(i)}) \log x_i = \sum_{z^{(i)}}Q_i(z^{(i)}) \log \frac{p(x^{(i)}, z^{(i)}; \theta)}{Q_i(z^{(i)})} $$ which holds for every fixed $i$.

Summing over all $i$, we finally obtain $$ \sum_i \log \sum_{z^{(i)}}Q_i(z^{(i)}) \frac{p(x^{(i)}, z^{(i)}; \theta)}{Q_i(z^{(i)})} \geq \sum_i \sum_{z^{(i)}}Q_i(z^{(i)}) \log \frac{p(x^{(i)}, z^{(i)}; \theta)}{Q_i(z^{(i)})} $$ which shows the relation between (2) and (3).

0
On

Notice that $\log (x)$ is a concave function.

Hence $$\log(Ex) \geq E\log(x)$$ If $Q_i \geq 0, \sum Q_i =1$, $$\log(\sum Q_i h_i) \geq \sum Q_i \log h_i$$

In this question, $Q_i = Q_i(z^{i})$ and $h_i = \frac{p(x^{(i)}, z^{(i)}; \theta)}{Q_i(z^{(i)})}$