KL Divergence (T-Sne) I have $$ C = \sum_{i} KL(P_i ||Q_i) = \sum_{i} \sum_{j} \mathbf{p}_{(j|i )} \log \frac{\mathbf{p}_{(j|i)}} {\mathbf{q}_{(j|i)}}$$
This is your basic KL divergence.$P_i$ represents the conditional probability distribution over all given data-points given $x_i$ , and $Q_i$ represents the conditional probability distribution over all map points given $y_i$. I understand that this is asymmetric.
I am not able to understand this line "In particular, there is a large cost for using far map points to represent data-points that are close (i.e, using a small qj|i to model a large pj|i)." can anybody help in understanding this statement. Here is the link to full paper pg 2
I took a quick look at the paper. I think that sentence is helping you read the formula by pointing out that when $p_{j|i} >> q_{i|j}$ the term with $\log(p/q)$ contributes a lot to the $KL$ metric but the term with $\log(q/p)$ contributes very little. It explains in words what that asymmetry means in this particular application.