identity testing- l1 versus l2 separations?

40 Views Asked by At

I am interested in a problem related to the so-called "identity testing".

Suppose I am interested in testing two distribution distributions p and q (probability vectors, m atoms) with certain number of samples, whether or not they are the same.

The type of results in the references you send me are in the following form:

when ||p-q||_1 > \epsilon, the optimal test is to compute the chi-square divergence between the two, and it requires a number of samples to be **.

Now we are able to obtain something like this

When ||p-q||_2 > \epsilon, the optimal test is to compute ||p-q||_2 with a threshold and the it requires a number of samples to be **.

So the difference is really in l1 and l2 separation of true distributions.

I read in some paper claiming that |2 separation is not as "good" as l1 separation, because when the support of p and q are different, l2 can have the wrong result. But I think l1 will also have this issue (to avoid this, one might be better off using the Wasserstein metric; there is an example for this).

In other words, which one is better (and more useful), the l1 or l2 separation? I thought one possibility is to consider their connection to other fundamental information metrics (such as KL divergence, Hellinger affinity etc.) but it is not clear to me, which one is more fundamental, l1 or l2 separation between probability vectors.

Thank you all.