Under which circumstances are the Jensen Shannon Divergence JSD and dot product equivalent?

51 Views Asked by At

Recently I wanted to study the dot product: $$cos \theta = \frac{x \cdot y}{||x||_2 \cdot ||y_2||}_2 $$

I was wondering if we made some assumptions, e.g., only normalized vectors which would reduce the equation to: $$cos \theta = x \cdot y$$

if it is possible to make a connection to distance measurements used in probability theory such as JSD (not KL-divergence because it is not symmetric and thus not a distance). Already, by normalizing we made the vectors to "probability distributions".

The JSD is given by

$$JSD = KL_d(P||M) + KL_d(Q||M) $$ with M given as $M:= \frac{P + Q}{2}$. and KL divergence given as

$$\int p(x) \log{\frac{p(x)}{q(x)}}$$

The first thing I noticed is, that in JSD, 0 means similiar, while the dot product is 0 for orthogonal vectors.

My not to well formalized question is, how would you (if possible) modify the JSD or add some constraints to make them equivalent. I ask this question to make more sense of both.

Maybe a similiar question is if it is possible to model discrete probability distributions as a vector space and if we do this into what metric would the JSD translate to.