Suppose given a probability distribution $q$. I'm trying to estimate it by $p$ such that the KL-divergence between $p$ and $q$ is minimized. Now which one of the two: $KL(p||q)$, $KL(q||p)$ should be minimized and why?
I understand that minimizing one of them will also minimize the other but for EM-like situation the first-one is used. I need some explanation for the same.
Thank you !
Following the definition given by Wikipedia, if we want to approximate a fixed distribution $Q$ with a tailor-made distribution $P$ we have to minimize $$ D_{KL}(Q\,||\,P)=\int_\mathbb{R}q(x)\log\frac{q(x)}{p(x)}\,dx.$$