Things like KL-divergence or total variation distance tell us how far certain distributions are from each other. Say I have two distributions $p,1$ and I want to know the (smallest or expected) number of samples $m$ such that I can distinguish between $p$ and $q$ with a certain success probability. Obviously, if $p$ and $q$ are far away from each other (i.e. the TV-distance or the KL-divergence is large) then the required number of samples $m$ will be smaller. I am sure this has already been derived, would anybody have a source?
Thanks!
An interesting case is when $p$ and $q$ are supported on a finite set, say $[n]=\{1,\ldots,n\}.$ The paper that started research along these lines in the CS literature is Batu et al see here. The distance used is the total variation distance.
Theorem 1. Given $\delta>0,$ and $p,q$ over $[n]$ there is a test which runs in time $O(\epsilon^{-4} n^{2/3} \log n \log \frac{1}{\delta})$ such that if $$TV(p,q)=|p-q|=\sum_i |p_i-q_i| \leq \max\left(\frac{\epsilon^2}{32 n^{1/3}},\frac{\epsilon}{4 n^{1/2}}\right)\quad(1),$$ then the test outputs PASS (i.e., that $p$ and $q$ are closer than the bound in (1)) with probability $\geq 1-\delta.$
Alternatively if $TV(p,q)> \epsilon$ the test outputs FAIL (i.e., that $p$ and $q$ are further away than $\epsilon$ in TV distance) with probability $\geq 1-\delta.$
The "runs in time" refers to sampling the distributions in a black box query model. Note that the probability of correct answer by the test is equal in both cases.