I have two 2D probability distributions of eye movements of two different images. Suppose I call the first distribution of Image 1: $P$, and the second distribution of image 2: $Q$. Since KL-divergence is assymetric, how should I know which way to compute $D_{KL}(P|Q)$, or $D_{KL}(Q|P)$ ? Semantically, what is this difference? And finally, how should I interpret the value that KL-Divergence returns to me? I would know how to interpret say, euclidean distances, or manhattan distances between the heatmaps (and I could have a fair idea of what is 'big' and 'small'), are there any general rules of thumb for KL-divergence, that might depend for example on vector dimensionality?
I'm attaching the sample data, so you guys can have a look. Note: Image 1, and Image 2 look very alike, but they are actually different! (The Computer Mouse is present in one case, and it is absent in the other)
Image 1:

Image 2:
